Hi everyone,
I have so many .txt and .csv files. In some of them, there are records that they have more than specific delimiters and I got this message:
"Record #1219: Tool #2: Too many fields in record #57"
How to ignore the wrong records and read all the file to end?
@binuacs
No consider:
Col1,Col2
22@22.com,qwerre
44@234.com,loplopl
11@11.com,123$,%()
78@78.com,98/*-+
there is a delimiter , and col2 consists of another ,
While running the file, I got that message.
@Saraabdi955 use \0 as the delimiter instead of , also uncheck first row contains field names (6th option in input tool configuration). Then use text-column tool to split the records
it's not correct.
please watch the attached dataset.
what should we do when we have more than delimiters determined?
Hi @Saraabdi955
Is it because you have a comma in the password section? Using @binuacs solution you can get it to work - just restrict the Text to Columns tool to 2 columns only, leaving the extra in last column:
Your output is this:
If this isn't right i'd provide detail on how you expect the output to look
Thanks @davidskaife
it's ok but not exactly.
If there was another column after password, we couldn't do it.
Hi @Saraabdi955
Try this Regex Parse, i've tested it with an additional column after the password:
Here is the setting for the Regex:
Hi @Saraabdi955
What you have here is classic example of "Dirty Data". It's something for which there is no easy or simple fix for in any analytics platform, Alteryx, DataBricks, PowerBI, informatica, etc. The process of cleaning data can take up to 25% of a data scientist's time.
There's a reason that the Alteryx input tools throw errors and warnings if the schema of multiple files doesn't match or it a row in a csv file doesn't have the same number of delimiters as the header row. The tools were developed with the principle that it is better to exclude suspect data than to include it and have it produce erroneous results.
There are techniques that you can use to clean data, but they need to be applied on a case by case basis, by someone who has access to the entire input data. @binuacs has suggests loading all the files with no delimiters and then parsing the data. @davidskaife suggest using regex to parse some problematic data. What you need to do now is use these methods, and work your way through all the data files applying these and other methods until until you can load and parse all the data.
If you have specific questions and you can provide specific examples, please post more questions as you work your way through the data.
Dan