Hi All. I've been given some data that should have a simple format. Here are the three fields: ID1, ID2, Comment
Both IDs are numbers, and the Comment is free text. The problem is that the comment field hasn't been wrapped in double quotes, so when you parse it, things like commas and carriage returns are interpreted literally rather than escaped as they should in a csv wrapped in "".
I opened the data in Alteryx using the '\0' csv delimeter trick. Here is a good example of what I'm dealing with:
- 13353,5546,Proper line of feedback with no commas in comment seems to work OK
- 85575,348568,I think it would be better, I mean, why can't we do the following.
-
- The process should be streamlined
- 53343,8923,Proper line of feedback
Lines 1 and 5 are OK. Lines 2, 3 and 5 have some problems:
- Line 2 has commas in the text comment. Since commas are also used to delimit the fields, we need to wrap everything after the second comma in double quotes.
- Line 3 is an extra carriage return preserved in the text comment
- Line 4 is a continuation of the comment belonging to the IDs in line 2. How can these comments be appended to the previous line, replacing a carriage return with a space?
This is possibly a quick job in Regex? Ideally I'd like the data to look like this:
- 13353,5546,"Proper line of feedback"
- 85575,348568,"I think it would be better, I mean, why can't we do the following. The process should be streamlined"
- 53343,8923,"Proper line of feedback"
I can see that if a line start with a number it's probably an ID, and if it starts with a non-numerical char it's probably a comment that needs to be moved up. But I don't know enough about regex to detect these things.
Finally, the two ID fields are numeric, but of varied lengths, so I can't just judge a solution based on character positions.
If anyone can point out how to solve this problem I'd be very grateful! Thanks