ALTERYX INSPIRE | Join us this May for for a multi-day virtual analytics + data science experience like no other! Register Now
The Alteryx Community will be temporarily unavailable for a time due to scheduled maintenance on Thursday, April 22nd. Please plan accordingly.

Alteryx Designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.

Tokenize via RegEx

G_WOLAK
7 - Meteor

I've used the download tool to pull down the names of files present on our sftp site.  The results are the files names are in a single cell.  I'm trying to use RegEx to tokenize this data, but am running into a problem when I get to the file name itself which as you could imagine can vary greatly. 

 

I have a situation where I'm trying to Tokenize data pulled down using the Download tool. 

 

The results of the download  ftp site being placed in a single cell.   Example below.

 

-rw-rw---- 1 no-user no-group 4425848 Dec 13 2018 ABC 123.txt
-rw-rw---- 1 no-user no-group 4425848 Dec 13 2018 123 ABC.txt

 

The RegEx flow I've put together thus far is below.

 

-[[:alpha:]]{2}-[[:alpha:]]{2}----\s{3}\d\s[[:alpha:]]{2}-[[:alpha:]]{4}\s{2}[[:alpha:]]{2}-[[:alpha:]]{5}\s+\d+\s[[:alpha:]]{3}\s\d+\s{2}\d{4}\s

 

Resulting in the output shown below

 

G_WOLAK_0-1584556750953.png

 

At this point my efforts fall apart as the file names can be different.  If I add a .+ to the end it doesn't recognize the line break in the cell resulting in this.

 

G_WOLAK_1-1584556848367.png

 

How can I model the end of my RegEx flow to account for the varying file names so that it also results in said file names being on their own rows in the output?

Thableaus
17 - Castor
17 - Castor

Hi @G_WOLAK 

 

Use Regex Tool in PARSE mode with this expression:

 

.*\d{4}\s+(.*?\.\w+)\s*.*

 

This will get everything after the four digit year until the (.file extension) mark

 

Cheers,

 

G_WOLAK
7 - Meteor

Thank you. 

 

I didn't know how to define the criteria of a period which I see is done by the \.

 

Thanks for also simplifying the expression.

Labels