Hi
I have a csv data input that has irregular delimiters. Most of them are separated by commas, however some cells include commas within themselves. As a result it splits the data out incorrectly. Please see sample data in spreadsheet attached and desired output.
How do I solve this?
Solved! Go to Solution.
Hi @FECL3
Here is a workflow for the task.
Output:
Workflow:
Hope this helps 🙂 Let me know if this works for the larger data.
If this post helps you please mark it as solution. And give a like if you dont mind 😀👍
Thanks for the workflow, I can't seem to extrapolate it to my data source. Do you have a useful online resource that explains the regular expression in the "Format to convert" box?
Hi @FECL3
You will be able to find the meanings/usage of each symbols here.
But building the expression is based on logic
Here is the explanation of the regex i am using. As we already know we want to split it on commas using brackets like () will catch part of the string.
(\w*),(.*),(\w*),([\d/]*),([\d/]*) split
(\w*) - Alpha + numeric zero or more times
(.*) - Any character zero or more times. This takes in commas and space too.
(\w*) - Alpha + numeric zero or more times
([\d/]*) - Numeric + /(backslash) to accommodate dates
([\d/]*) - Numeric + /(backslash) to accommodate dates
This might help
https://community.alteryx.com/t5/Interactive-Lessons/Parsing-Data-with-RegEx/ta-p/441415
https://community.alteryx.com/t5/Alteryx-Designer-Knowledge-Base/Tool-Mastery-RegEx/ta-p/37689
https://community.alteryx.com/t5/Alteryx-Designer-Knowledge-Base/RegEx-Perl-Syntax-Guide/ta-p/1288
If you tell me whats the issue you are facing i can help you out.
Happy to help 🙂
Cheers and Happy Analyzing 😀
How would I parse it if there was a "-" in the middle of one of the cells. e.g. "....,Arizona - Phoenix,......" and I want the output to be "Arizona - Phoenix"?
Same question for if there was a "." in the middle of one of the cells. e.g. "A.I.M. Inc" and I want the output to be "A.I.M. Inc"?
The data in confidential and so I cannot share
Hi @FECL3
For first case you can use ([\w\s-]*) for second you can use ([\w\s\.]*) check and let me know.
It also vastly depends on the text before and after check the above and see how to accommodate it.
Hope this helps 🙂
Hello @FECL3
RegEx is really a powerfull tool. And I love it so much. But if you are not comfortable with the regular Expression (we all are at the begining 😀 ) you can use an easiest way with the Text to column tool.
This tools is here to split one source column (here "Field1") in N target columns (here 5). To know what is in each of these column it is looking for a Delimeters (here our comma).
So regarding questions you are asking it should work as soon as there is no comma in your fields but only as a separator.
Then you have to rename your 5 columns.
And it gives the same result than @atcodedog05
Hope it helps
Hi @StephaneP
Apparently in his first scenario there is commas (,) in the value of coloum. Hence regex was a suggestable idea 🙂
Yes you are right, If there is a comma in the value of the column, Text to column will not work as expected and you have to discover the magic of Regex.
By the way here is a web site to test and understand your RegEx code. Very usefull. It explains how is interpret your code. Very funny sometime to realize how far you are from what you expect. 😋
Another one that I haven't tested in detail but looks funny: You write in plain english what you want to do and it translate it in Regex
But the better trick of course is to have your own pet to help you.