I have a large .csv file that I am trying to read in through the Input Data tool. There are carriage breaks in random rows throughout the file and the input tool is treating each of those as new rows. I have tried selecting "AMP Only: Allow Newlines in Quoted Fields" but it is still creating new records for each break.
Here are my input settings:
I've added a sample of the data and you can see in rows 4 - 8, those should all be on the same line.
Any help is much appreciated.
Hi @Andy_Hines
This is a tough one. Is the text with the line breaks always something always something like
"Automated refund
Reversed by ctrl# xxxxxx
update refund"
Dan
It does seem to always say something like "Automated refund" - yes.
Hi @Andy_Hines
If you can figure out the pattern to these messages, specifically how they start i.e. "Automated refund" and how they end "update refund" you can concatenate the entire file into a single string by replacing the crlf with "|" and then use a regex replace to get rid of pipes within the start and end sequences. Then split the resulting string to rows on the "|" and then split to columns on "," you should be able to get the file into a reasonable format.
Dan
Hey @Andy_Hines!
As Dan correctly stated, the task would be much easier if we can identify specific starting and ending patterns. However, I found a solution that worked on the sample and may be applicable to the whole dataset if the schema is the same.
The attached workflow ensures that the last three symbols of the cell must be a dot followed by two numbers. If this ending pattern persists throughout the whole dataset, we can combine all the rows together and "fix" the csv file.
I understand that it is not the ideal solution, and more pertains to cleaning the dataset rather than setting up the input tool, but I hope you can find this useful.