Hi All. I've been given some data that should have a simple format. Here are the three fields: ID1, ID2, Comment
Both IDs are numbers, and the Comment is free text. The problem is that the comment field hasn't been wrapped in double quotes, so when you parse it, things like commas and carriage returns are interpreted literally rather than escaped as they should in a csv wrapped in "".
I opened the data in Alteryx using the '\0' csv delimeter trick. Here is a good example of what I'm dealing with:
Lines 1 and 5 are OK. Lines 2, 3 and 5 have some problems:
This is possibly a quick job in Regex? Ideally I'd like the data to look like this:
I can see that if a line start with a number it's probably an ID, and if it starts with a non-numerical char it's probably a comment that needs to be moved up. But I don't know enough about regex to detect these things.
Finally, the two ID fields are numeric, but of varied lengths, so I can't just judge a solution based on character positions.
If anyone can point out how to solve this problem I'd be very grateful! Thanks
Solved! Go to Solution.
A MultiRow formula can be used to join the relevant lines based on a regular expression matching 'number,number,everything else' ... then split to 3 columns. This will give you everything but will have duplicates with the appended and non-appended rows... to remedy that, sort the third column descending prior to doing a "unique" on the first towo columns: the descending sort will ensure that "unique" grabs only the longer of the two rows. A final Formula can add in the quotes on the parsed third field. See attached - hope it helps.
Thank you, this looks incredible!
Unfortunately I've discovered that some comments span many lines, depending on how many times the user hit enter when typing their helpful feedback comments! Example attached (I have amended the text input). Since these span more rows than r-1, r, or r+1, can they still be handled by a multi-row formula?
Thank you
EDIT: actually the given solution works for multiple lines since it just continues appending them as it steps through the MultiRow formula.
However, you may need to add an initial Select tool, in order to expand the allowed field size, so that the appends don't get truncated... (see new attachment).