community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.

Distinguishing delimiter from value in importing data

Atom

Hi all

I have an issue in importing/cleaning data: I have received a large .csv file (~50M rows, ~10 target columns) delimited with commas; but some values also contain commas and Alteryx is treating them as delimiters, hence splitting the string into columns incorrectly (~2M rows). The additional complication is that the columns do not have a consistent number of characters. Is there a way to solve this? Any help appreciated.

Thanks,

MD

 

 

(Simplified) example:

 

Input:

Country,ID,Name,Sales,Margin
PL,1001,ABCDE,1000.00,500.00
PL,10101,DEF,G,HIJ,200.00,10.00

PL,101,XY,Z,,0.00

 

Target output:

CountryIDNameSalesMargin
PL1001ABCDE1000.00500.00
PL10101DEF,G,HIJ200.0010.00
PL101XY,Z 0.00

 

Current output:

CountryIDNameSalesMargin  
PL1001ABCDE1000.00500.00  
PL10101DEFGHIJ200.0010.00
PL101XYZ 0.00 
Alteryx Certified Partner
Alteryx Certified Partner
Easiest way is to request that the csv input have data have quotes. Alteryx can ignore delimiters inside of quotes.

“Frisch”,”Mark”,”Romeo,Juliet”

With your data I would import with \0 delimiter (add recordid) and filter rows with standard count of commas and find the ones with too many commas and then apply custom logic to those records.

Then you can union the data together and use text to fields to parse the data.

When creating the custom logic, you’ll have to see where in the records the bonus commas exist and create traps for them.

This is fragile and I’d recommend the use of tests to warn or error.

Cheers,

Mark
Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and reboot. Order shall return.

Probably the easiest way is to use a REGEX tool in a Parse mode.

 

2017-10-08_13-55-53.jpg

Based on your sample data I put a quick example together

 

As @MarqueeCrew says use the Input tool with \0 separator to read the file in before parsing it

Meteoroid

@ 

@Jaslusser12 looking at source data all Margin cells had a value but not all Sales ones did.

 

The * allows for 0 character matches where as the + requires a character to match. Arguably could have used * for both.

Meteoroid

@jdunkerley79 - thank you for the prompt response. Very helpful.

Labels