I need to parse the following strings:
Hodge D.R., Arizona State University (ASU), Phoenix, AZ, United States, Program for Research on Religion and Urban Civil Society, University of Pennsylvania, Philadelphia, United States; Carpenter B.M., ASU, Phoenix, United States; Yepez R.A., the River Source, Arizona City, AZ, United States; Lietz B.C., ASU, Mail Code 3920, 411 N, United States
Sun Y., Tencent Weixin Group; Lu T., W. P. Carey School of Business, Arizona State University, United States; Wang C., Guanghua School of Management, Peking University, China; Li Y.; Fu H., School of Management, Fudan University, China; Dong J., Tencent Weixin Group; Xu Y., School of Management, Fudan University, China
Chandwani A., Power Electronics and Control Engineering Laboratory (PEACE); Ira A. Fulton School of Engineering, Arizona State University, USA; Mallik A., Power Electronics and Control Engineering Laboratory (PEACE); Ira A. Fulton School of Engineering, Arizona State University, USA; Akturk A., CoolCAD Electronics LLC, USA
into rows like
Hodge D.R., Arizona State University (ASU), Phoenix, AZ, United States, Program for Research on Religion and Urban Civil Society, University of Pennsylvania, Philadelphia, United States
Carpenter B.M., ASU, Phoenix, United States
Yepez R.A., the River Source, Arizona City, AZ, United States
Lietz B.C., ASU, Mail Code 3920, 411 N, United States
Sun Y., Tencent Weixin Group
Lu T., W. P. Carey School of Business, Arizona State University, United States
Wang C., Guanghua School of Management, Peking University, China
Li Y.; Fu H., School of Management, Fudan University, China
Dong J., Tencent Weixin Group
Xu Y., School of Management, Fudan University, China
Chandwani A., Power Electronics and Control Engineering Laboratory (PEACE); Ira A. Fulton School of Engineering, Arizona State University, USA
Mallik A., Power Electronics and Control Engineering Laboratory (PEACE); Ira A. Fulton School of Engineering, Arizona State University, USA
Akturk A., CoolCAD Electronics LLC, USA
Any ideas on how to parse each string as so in a uniform fashion? I've been able to use regular expression to parse different subsets of these strings, but not all of them the way just described. Any help is appreciated.
At first glance, you can use ";" as the delimiter with Text to Columns tool to parse them into rows.
But if you do that, you will get some issues with:
Anyway, can you confirm the expected output again? Because some have ";" that indicates a split.
Hello,
Part of the issue (and challenge) with this dataset is that the semi-colon works great as a delimiter in most cases, but doesn't work for the following just given. I've been trying to use regular expressions to handle these exceptions - have had some success - but am stumped on how to handle all four of these cases and split them how I described above. Any further suggestions is appreciated. Thank you.
REGEX is pattern based, and in this case, your data isn't really following a pattern. It's a list of instituions that have differing lengths and placement of special characters and delimiters.
My approach is splitting them based on ";" as a delimiter, and that has around 80% success rate. You'll need some way or logic to define the bits that were split "incorrectly" to get what you want. Otherwise you can go with my method, and then have some room to manually adjust what you think is right/wrong.
That is the extent of my thinking on this, hope someone else can contribute.