Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Input Data Errors

tom_dup_567
7 - Meteor

Being new to Alteryx I have noticed that during the input process of CSV files I get warnings such as "An unescaped quote was found" with the record  number identified.  In my data this occurs because in some fields there are entries such as 24" (24 inches).  I use the Formula tool and a Replace function to deal with the rouge quote during the workflow.  My question is on the initial error, is there a way to resolve the error so that I know that the data moving through the workflow is getting cleaned? For instance, is there a way to create like a pre-sub-routine and stage the two workflows so that I get Results message for each routine?  The reason I'm thinking of this approach is that in this sceanario, and many others, you do not get verification that your formula resolved all the warnings especially if the problem occurs in multiple columns and because the warning is only reported during the input process. 

 

A similar problem occurs with rows of all Null values. The warning tells me that there are not enough fields in the record.  I know why it occurs and I deal with it in the workflow but I would like to deal with the error during the input process and know it is resolved before the data hits the main part of the workflow.  Thanks.

8 REPLIES 8
jdunkerley79
ACE Emeritus
ACE Emeritus

Assuming you dont have any quoted text.

 

The way to do this is to set the 'Ignore Delimiters in' option on the input tool to None:

2016-05-03_16-49-51.jpg

 

By default Alteryx expects " to escape a string.

jack_morgan
9 - Comet

Hey Tom,

@jdunkerley79 will get you started and the sample workflow that i've attached should take it home. That will bring in all of your quotes and just trim off the start and end quotes. best of luck!

Jack

tom_dup_567
7 - Meteor

My data comes with everything in quotes, so I have to treat the input carefully otherwise I have more issues to deal with on the inside of the workflow.  Both responses deal with the issue after the input warning occurs, and I've already done that.  What I'm trying to do is eliminate the warnings that occur at the beginning so that I know the data is clean - ignoring the warnings can cause problems if my solutions don't catch everything.  I don't see any other solution but to do a prescrub workflow, generate a new output that I use as input to the final workflow.  The input messages on the final workflow then provides me with confirmation that I do not have data issues.  Thanks to the both of you for your help..

jack_morgan
9 - Comet

Putting in a filter that will remove data all together if it isn't to spec would work instead. Get creative and use something like with (startswith([field_name],'"') OR right([field_name],2)='""'

 

Those in combination would give you any value that starts with a double quote or ends with 2 double quotes, that last piece should address for your column that has the legitimate double quote in it. Don't limit yourself to only checking the data on the input, other tools can address this issue for you along the way.

 

Jack

tom_dup_567
7 - Meteor

Jack, the filter is a good idea, then after I fix whatever is flagged I can blend it back in. Thanks!

cbridges
11 - Bolide

I have some nasty text files that were giving me problems too (not even csv, but prn, even though they can be interpreted as csv).

 

Here are the steps that work for me...

 

In the input tool, selected delimited and set the delimiter to none, and uncheck the box that says "First row contains headers".

 

Use the text to columns tool to split out your fields and set the delimiter to , (comma) or what I neede in my case is ,\t (comma and tab). Click to ignore delimiters in quotes and set the number of columns to whatever you need. I typically set this too high and then change the number of columns after everything's parsed.

 

Next, if your headers are not in the first row, use a sample tool to skip down to the header row.

 

Use a multi-field tool to strip the quotes "StripQuotes([_CurrentField_])".

 

Use the dynamic rename tool to get the field names from the first row.

 

Lastly, use a select tool to get rid of the first column because that column will be a concatenation of all your columns.

 

It's a bit lengthy, but it works for me.

 

@thizviz
jdunkerley79
ACE Emeritus
ACE Emeritus

Had a bit of a play on the train this morning.

 

Following a similar approach to @cbridges described.

 

- Read data in as a single field

- Use a Regex replace to switch " inside a text block to be another symbol (I chose ¶)

- Text to column it 

- Strip quotes and reset ¶ back to "

- Select and Dynamic rename to tidy up

 

Attached my experiment, hope it helps

 

 

kirkpoucher
6 - Meteoroid

Is there an easy way to dynamically update the # of columns value in the 'text to columns' tool?  I know I can do it with an action tool and update value with a formula, but struggling to figure out how to get the number passed into the action tool.  I'm basically trying to package this up as a macro which reads the first line of data (which will contain field headers and I know will not contain extra quotes or special characters), counts the delimiters, and passes the output to the action tool.  I'm sure there is an easy way to create a variable or something that can be referenced at run-time, but can't quite get there...

 

Picture1.jpg

Labels