Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.
Free Trial

General Discussions

Discuss any topics that are not product-specific here.

Regular Expression Help

alteryx_user_9999
6 - Meteoroid

I need to parse the following strings: 

 

Hodge D.R., Arizona State University (ASU), Phoenix, AZ, United States, Program for Research on Religion and Urban Civil Society, University of Pennsylvania, Philadelphia, United States; Carpenter B.M., ASU, Phoenix, United States; Yepez R.A., the River Source, Arizona City, AZ, United States; Lietz B.C., ASU, Mail Code 3920, 411 N, United States

 

Sun Y., Tencent Weixin Group; Lu T., W. P. Carey School of Business, Arizona State University, United States; Wang C., Guanghua School of Management, Peking University, China; Li Y.; Fu H., School of Management, Fudan University, China; Dong J., Tencent Weixin Group; Xu Y., School of Management, Fudan University, China

 

Chandwani A., Power Electronics and Control Engineering Laboratory (PEACE); Ira A. Fulton School of Engineering, Arizona State University, USA; Mallik A., Power Electronics and Control Engineering Laboratory (PEACE); Ira A. Fulton School of Engineering, Arizona State University, USA; Akturk A., CoolCAD Electronics LLC, USA

 

 

into rows like 

 

Hodge D.R., Arizona State University (ASU), Phoenix, AZ, United States, Program for Research on Religion and Urban Civil Society, University of Pennsylvania, Philadelphia, United States

 

Carpenter B.M., ASU, Phoenix, United States

 

Yepez R.A., the River Source, Arizona City, AZ, United States

 

Lietz B.C., ASU, Mail Code 3920, 411 N, United States

 

Sun Y., Tencent Weixin Group

 

Lu T., W. P. Carey School of Business, Arizona State University, United States

 

Wang C., Guanghua School of Management, Peking University, China

 

Li Y.; Fu H., School of Management, Fudan University, China

 

Dong J., Tencent Weixin Group

 

Xu Y., School of Management, Fudan University, China

 

Chandwani A., Power Electronics and Control Engineering Laboratory (PEACE); Ira A. Fulton School of Engineering, Arizona State University, USA

 

Mallik A., Power Electronics and Control Engineering Laboratory (PEACE); Ira A. Fulton School of Engineering, Arizona State University, USA

 

Akturk A., CoolCAD Electronics LLC, USA

 

Any ideas on how to parse each string as so in a uniform fashion? I've been able to use regular expression to parse different subsets of these strings, but not all of them the way just described. Any help is appreciated. 

3 REPLIES 3
caltang
17 - Castor
17 - Castor

At first glance, you can use ";" as the delimiter with Text to Columns tool to parse them into rows. 

 

But if you do that, you will get some issues with:

  • Li Y.; Fu H., School of Management, Fudan University, China
  • Chandwani A., Power Electronics and Control Engineering Laboratory (PEACE); Ira A. Fulton School of Engineering, Arizona State University, USA;
  • Mallik A., Power Electronics and Control Engineering Laboratory (PEACE); Ira A. Fulton School of Engineering, Arizona State University, USA

image.png

 

Anyway, can you confirm the expected output again? Because some have ";" that indicates a split.

Calvin Tang
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
alteryx_user_9999
6 - Meteoroid

Hello, 

 

Part of the issue (and challenge) with this dataset is that the semi-colon works great as a delimiter in most cases, but doesn't work for the following just given. I've been trying to use regular expressions to handle these exceptions - have had some success - but am stumped on how to handle all four of these cases and split them how I described above. Any further suggestions is appreciated. Thank you. 

caltang
17 - Castor
17 - Castor

REGEX is pattern based, and in this case, your data isn't really following a pattern. It's a list of instituions that have differing lengths and placement of special characters and delimiters. 

 

My approach is splitting them based on ";" as a delimiter, and that has around 80% success rate. You'll need some way or logic to define the bits that were split "incorrectly" to get what you want. Otherwise you can go with my method, and then have some room to manually adjust what you think is right/wrong. 

 

That is the extent of my thinking on this, hope someone else can contribute. 

Calvin Tang
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
Labels
Top Solution Authors