Alteryx Designer Desktop Discussions

ericlaug · ‎06-30-2017

I have a CSV file where the fields are not enclosed in quote marks but some text fields contain quote marks as part of the data - for example 5" X 3" Cards. Rows with this kind of value don't import correctly. I've tried all four settings of Option 9 of the Input Data tool "Ignore Delimiters in" without success.

The odd thing is that this file opens up perfectly in Excel. So my workaround is to open the CSV in Excel and then save it as an Excel document and use that as my input.

If I were to guess at a solution, it might be:

Automate the conversion of the CSV file to an Excel document.
Otherwise transform the CSV file before importing it as data.
Try something I haven't thought of.

The attached sample produces the kind of error I'm getting. Any ideas?

Thanks!

Joe_Mako · ‎06-30-2017

How about something like the attached macro?

If you do not want it as a macro, just remove the Interface and Detour tools.

Look to me like the Input tool has a bug.

tom_montpool · ‎07-04-2017

I would probably approach this in a couple of ways depending on what else is in the file.

Open the CSV file in Alteryx as a non-delimited (e.g. delimiter == \0) format
- Replace all commas with ','
- Use Run Command to write out the modified file and re-import as a standard CSV
Open the CSV file in Alteryx as a non-delimited format
- Replace all quotation marks with something else (e.g. ")
- Use Run Command to write out the modified file and re-import as a standard CSV

JessicaS · ‎07-06-2017

HI @ericlaug,

I'm taking a look at this, and I do notice a strange format in one of these rows when I take a look in notepad.

Row 5, which is the one I get the error on has new jersey enclosed in quotes, but no quote at the end of the string. If I remove the quotes, the string reads in for me without a warning message in Alteryx.

For me, this row is the culprit because if I delete it with notepad, the file also reads in ok.

Jess Silveri
Manager, Technical Account Management | Alteryx

ericlaug · ‎07-06-2017

Thanks for the suggestion, Tom, but either option won't work in my particular case. Some of my values contain commas, some quotes, and some a combination of both.

The software that creates this output is proprietary. It does seem to know enough that when the data contains commas, it encloses the entire value in double quotes ex. "LABEL, MAILING LOGO, 1000/RL, USE SC#143612".

ericlaug · ‎07-06-2017

Hi Jessica,

It looks like you missed the full value of the field. The actual value is "NEW JERSEY" LABEL (5" X 3") (500EA/RL) . But you are correct that this is a row that causes the problem. Unfortunately, my actual data set contains several 100K rows and deleting or editing the offending rows (or even finding them) is not really practical.

Thanks,

Eric

Joe_Mako · ‎07-06-2017

Here is an updated macro with some additional annotations. It also now removes the wrapping quote marks, so the data is cleaner.

I also attached an example workflow that uses the macro to load the file.

NicoleJohnson · ‎07-06-2017

I think you might be able to trick the system in the actual Input tool... try this:

1. Change the delimiter type to something that is other than a comma (the default). For example, a pipe | might do the trick, as long as you don't have pipes in your data (seems like you wouldn't based on the sample, but hard to say). I would also uncheck the box for First Row Contains Names. This will bring everything in to one field (because it won't find any delimiters to split on).

2. Use the Text to Columns tool for the single field to split into your 22 columns based on comma delimiter, ignoring delimiters in quotes

3. Select tool to remove the initial column that had everything in it, then Dynamic Rename to take the column names from the first row.

See attached... might be a weirdly simple solution that actually does what you need it to do?? To spot check it, you could filter for a column that you know needed to contain specific data (numbers, a word in a specified list, etc.) to make sure there are no more rogue transactions.

Hope that helps! :)

NJ

Joe_Mako · ‎07-06-2017

@NicoleJohnson We can use \0 as the deliminator and that will tell the Input tool to not split a line into columns.

Nice work on using so few tools, to take it a step farther, what is the least amount of tools to make it so we do not need to know/specify the number of columns beforehand?

NicoleJohnson · ‎07-06-2017

Even better, @Joe_Mako!! That's exactly the "trick" I was looking for, \0. Definitely going to log that one away... (Lordy, I love this place.)

Okay, 5 tool solution to get to the right number of columns without knowing the number of columns ahead of time: RecordID > Text to Columns (split to rows) > Multi-Row Tool w/ Group By RecordID to generate "column numbers" (I'm sure you could use Tile tool, but I personally like Multi-Row better) > Cross-Tab > Dynamic Rename. Can it be fewer tools than that?? Ready, set, go. :)

NJ

Alteryx Designer Desktop Discussions

Quote Marks in CSV File