Alteryx Designer Desktop Discussions

Saraabdi955 · ‎02-21-2023

Hi everyone,

I have so many .txt and .csv files. In some of them, there are records that they have more than specific delimiters and I got this message:

"Record #1219: Tool #2: Too many fields in record #57"

How to ignore the wrong records and read all the file to end?

binu_acs · ‎02-21-2023

@Saraabdi955 Change the Field length to 999999 and try again

Saraabdi955 · ‎02-22-2023

@binu_acs
No consider:

Col1,Col2
22@22.com,qwerre
44@234.com,loplopl
11@11.com,123$,%()
78@78.com,98/*-+

there is a delimiter , and col2 consists of another ,

While running the file, I got that message.

Saraabdi955 · ‎02-22-2023

No consider attached file:

there is a delimiter , and col2 consists of another ,

While running the file, I got that message.

binu_acs · ‎02-22-2023

@Saraabdi955 use \0 as the delimiter instead of , also uncheck first row contains field names (6th option in input tool configuration). Then use text-column tool to split the records

Saraabdi955 · ‎02-22-2023

it's not correct.

please watch the attached dataset.

what should we do when we have more than delimiters determined?

davidskaife · ‎02-22-2023

Hi @Saraabdi955

Is it because you have a comma in the password section? Using @binu_acs solution you can get it to work - just restrict the Text to Columns tool to 2 columns only, leaving the extra in last column:

Your output is this:

If this isn't right i'd provide detail on how you expect the output to look

Saraabdi955 · ‎02-22-2023

Thanks @davidskaife
it's ok but not exactly.

If there was another column after password, we couldn't do it.

davidskaife · ‎02-22-2023

Hi @Saraabdi955

Try this Regex Parse, i've tested it with an additional column after the password:

Here is the setting for the Regex:

danilang · ‎02-22-2023

Hi @Saraabdi955

What you have here is classic example of "Dirty Data". It's something for which there is no easy or simple fix for in any analytics platform, Alteryx, DataBricks, PowerBI, informatica, etc. The process of cleaning data can take up to 25% of a data scientist's time.

There's a reason that the Alteryx input tools throw errors and warnings if the schema of multiple files doesn't match or it a row in a csv file doesn't have the same number of delimiters as the header row. The tools were developed with the principle that it is better to exclude suspect data than to include it and have it produce erroneous results.

There are techniques that you can use to clean data, but they need to be applied on a case by case basis, by someone who has access to the entire input data. @binu_acs has suggests loading all the files with no delimiters and then parsing the data. @davidskaife suggest using regex to parse some problematic data. What you need to do now is use these methods, and work your way through all the data files applying these and other methods until until you can load and parse all the data.

If you have specific questions and you can provide specific examples, please post more questions as you work your way through the data.

Dan

Alteryx Designer Desktop Discussions

Ignore wrong records

Re: Unable to get an output

Re: Extracting the list of sheet names across mult...

Re: Chaining Apps

Re: Unable to read in all raw xml from an excel fi...

Unable to read in all raw xml from an excel file