This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
I have a CSV file where the fields are not enclosed in quote marks but some text fields contain quote marks as part of the data - for example 5" X 3" Cards. Rows with this kind of value don't import correctly. I've tried all four settings of Option 9 of the Input Data tool "Ignore Delimiters in" without success.
The odd thing is that this file opens up perfectly in Excel. So my workaround is to open the CSV in Excel and then save it as an Excel document and use that as my input.
If I were to guess at a solution, it might be:
Automate the conversion of the CSV file to an Excel document.
Otherwise transform the CSV file before importing it as data.
Try something I haven't thought of.
The attached sample produces the kind of error I'm getting. Any ideas?
I'm taking a look at this, and I do notice a strange format in one of these rows when I take a look in notepad.
Row 5, which is the one I get the error on has new jersey enclosed in quotes, but no quote at the end of the string. If I remove the quotes, the string reads in for me without a warning message in Alteryx.
For me, this row is the culprit because if I delete it with notepad, the file also reads in ok.
Thanks for the suggestion, Tom, but either option won't work in my particular case. Some of my values contain commas, some quotes, and some a combination of both.
The software that creates this output is proprietary. It does seem to know enough that when the data contains commas, it encloses the entire value in double quotes ex. "LABEL, MAILING LOGO, 1000/RL, USE SC#143612".
It looks like you missed the full value of the field. The actual value is "NEW JERSEY" LABEL (5" X 3") (500EA/RL) . But you are correct that this is a row that causes the problem. Unfortunately, my actual data set contains several 100K rows and deleting or editing the offending rows (or even finding them) is not really practical.
I think you might be able to trick the system in the actual Input tool... try this:
1. Change the delimiter type to something that is other than a comma (the default). For example, a pipe | might do the trick, as long as you don't have pipes in your data (seems like you wouldn't based on the sample, but hard to say). I would also uncheck the box for First Row Contains Names. This will bring everything in to one field (because it won't find any delimiters to split on).
2. Use the Text to Columns tool for the single field to split into your 22 columns based on comma delimiter, ignoring delimiters in quotes
3. Select tool to remove the initial column that had everything in it, then Dynamic Rename to take the column names from the first row.
See attached... might be a weirdly simple solution that actually does what you need it to do?? To spot check it, you could filter for a column that you know needed to contain specific data (numbers, a word in a specified list, etc.) to make sure there are no more rogue transactions.
Even better, @Joe_Mako!! That's exactly the "trick" I was looking for, \0. Definitely going to log that one away... (Lordy, I love this place.)
Okay, 5 tool solution to get to the right number of columns without knowing the number of columns ahead of time: RecordID > Text to Columns (split to rows) > Multi-Row Tool w/ Group By RecordID to generate "column numbers" (I'm sure you could use Tile tool, but I personally like Multi-Row better) > Cross-Tab > Dynamic Rename. Can it be fewer tools than that?? Ready, set, go. :)