Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

REGEX to Parse Flat file that cannot be fixed width due to varying column sizes

jenv20032000x
7 - Meteor

Greetings All,

I have 45 text files that I've tried multiple ways to load via alteryx dynamic input tool.  I have tried multiple methods and the best so far has been a flat file.  I've gotten my column headers and filtered out all irrelevant rows leaving the meat of the data shown here.  There are only 9 columns possible.  Column 2 is a date as well as column 3 (formatted DD-MMM-YYYY), then there are 5 columns of values followed by the final column of 6 digits as well as a potential letter.  These values are text so the commas and decimal points are important, especially where other currencies exist. The first column is what trips me the hardest.  You can see it has varying digit and letter combinations and of various sizes.  

 

I will also mention I've tried the text to column using space as delimiter and I get 70 columns, then trying to figure out which columns should line up with which is overwhelming to say the least.  I've tried this for parsing (\d{2}-\u{3}-\d{4}) but obviously that only gives me one date column, I need both.  Every time I duplicate that parsing string, I get nulls.

 

Can you please offer suggestions on how to parse the 9 columns?  I've burned two days on this and I just can't get the right results without your help!  I appreciate your help in advance.Regex snip.PNG

4 REPLIES 4
binuacs
20 - Arcturus

@jenv20032000x Can you provide the input file in the text format or excel?

NeoInfiniTech
8 - Asteroid

Hello @jenv20032000x, this expression should work with the RegEx tool:

 

^(.+) +([0-9]{2}-[A-Z]{3}-[0-9]{4}) +([0-9]{2}-[A-Z]{3}-[0-9]{4}) +([0-9,-]+) +([0-9,-]+) +([0-9,-]+) +([0-9,-]+) +([0-9,-]+) +([0-9]+)(?:[^0-9]+)?$

 

 

I added an additional (?:[^0-9]+)? to the end in case there are trailing spaces or non-digit characters, you can delete it if not necessary.

 

Edit: Slightly modified expression to also support decimal points if there are any:

 

^(.+) +([0-9]{2}-[A-Z]{3}-[0-9]{4}) +([0-9]{2}-[A-Z]{3}-[0-9]{4}) +([0-9,\.-]+) +([0-9,\.-]+) +([0-9,\.-]+) +([0-9,\.-]+) +([0-9,\.-]+) +([0-9]+)(?:[^0-9]+)?$

 

 

jenv20032000x
7 - Meteor

NeoInfiniTech, I can't thank you enough!  Wish I had asked sooner.  Thank you and have a blessed day!

jenv20032000x
7 - Meteor

@NeoInfiniTech I can't begin to thank you enough!  I missed adding the @sign so you would see this post so here I am posting my thanks again.  This worked wonderfully and I'm truly grateful.  Have a blessed day!

Labels