Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Parsing out the file name

ayoun
6 - Meteoroid

Hello all,

 

I am trying to do some file name validations and I need some help in parsing out just the filename and the date. 

 

image.png

 

I essentially need the output to be as follows:

 

File NameDate
AL01312017
BAR01312017
BR01312017
BRT01312017
HDR01312017
Test01312017
TL01312017
TT01312017

 

 

Text-to-Column would typically work using "_" as the delimiter, but this can cause issues if the user inputs the file name wrongfully (i.e. misses an "_" or inputs "__").

 

Any assistance would be appreciated!

 

Thanks

 

5 REPLIES 5
tcroberts
12 - Quasar

I've used a regex here, let me know if it helps, or if you need some clarification of what's happening:

 

parsefilenameregex.PNG

ayoun
6 - Meteoroid

Wow you are a life saver.  If you don't mind would it be possible to breakdown the RegEx and provide explanation?

 

\d\_*([A-Z]+)\_*(\d+)\_*.*

 

If it's too much work, no worries.  

 

Thank you again, much appreciated.

tcroberts
12 - Quasar

No problem!

 

"\d\_*" says: look for a digit, followed by any number of underscores. The "\" before the underscore tells the RegEx to look for the literal character "_". This means that it would match: "6_", "5___", "1"

 

The round brackets are what's called a Capturing Group, meaning, parse this bit out into a new column.

 

"([A-Z]+)\_*" says: One or more capital letters, followed by any number of underscores. Notice that the \_ is outside of the capturing group, so that we do not keep this in our parsed column.

 

"(\d+)\_*.*" says: One or more digits, followed by any number of underscores, followed by any other characters.

 

Hopefully this was clear, let me know if there's anything you want me to expand on.

 

 

ayoun
6 - Meteoroid

What if in cases where I wanted to split into 3 columns instead?

 

Firm NameFile NameDate
Alberta_Counsel_15175AL01312017
Alberta_Counsel_15175BAR01312017
Alberta_Counsel_15175BR01312017
Alberta_Counsel_15175BRT01312017
Alberta_Counsel_15175HDR01312017
Alberta_Counsel_15175Test01312017
Alberta_Counsel_15175TL01312017
Alberta_Counsel_15175TT01312017
tcroberts
12 - Quasar

In that case, you'd use:

 

"(.*\d)\_*([A-Z]+)\_*(\d+)\_*.*"

 

The difference is that I've added an additional capturing group (), with .* before the \d, meaning that it will catch any number of any characters, ending with a digit.

 

Is this what you were looking for?

Labels