Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

How do I extract a serial # from a file path when its not always in the same position?

hellyars
13 - Pulsar

My source data has one field.  That one field is a file path.  There is sub-directory in each file path that uses a serial # for its name.  I am trying to isolate that serial #.   Nine times out of ten, that serial # is the first child in a file's path.   But, nothing is ever that easy.  Sometimes, it is the 3rd or 4th child in the path.  

 

I tried using the Text to Columns to parse the serial #.  It easily isolates the serial # for those entries where it appears as the first child under the file path.  But, how do I extract it for when it is not?

 

Here is an example of the source data...

 

The serial # is always 13-digits or 16 characters with the dashes.  

 

How do I extract just the serial #?

 

PathWhat I Want  
AIL\5996-01-476-11365996-01-476-1136  
AIL\5996-01-502-14175996-01-502-1417  
Another Customer\Their Book 1\1095-00-866-23981095-00-866-2398  
AIL\5996-56-582-46575996-56-582-4657  
Another\Yada\Their Yada Yada\5487-98-567-56415487-98-567-5641  

 

4 REPLIES 4
jasperlch
12 - Quasar

Hi @hellyars

 

The serial numbers can be extract well by Regex tool: 

 

Method 1: if you are sure that the pattern of the serial number is 4 digits, than a dash, 2 digits, a dash, 3 digits, a dash, 4 digits 

what is the pattern of the 16-character serial number?

 

Method 2: if you are sure the serial will always be all texts after the last occurrence of '\' 

 

Capture1.PNG

patrick_digan
17 - Castor
17 - Castor

@hellyars If your path always has slashes (\) and the serial number is at the end, you could use this formula to get the serial #:

FileGetFileName([Path])

It works on your sample data.

hellyars
13 - Pulsar

Method 1 worked perfectly.

 

I stumbled into another solution as well.

 

I used the Text to Parse tool against the Path.   I then filtered the results using the formula regex_match(LEFT([Path2],1),"^\d").  I then further used the Text to Parse tool to parse the results until I was able to isolate the serial #.   I then used the Union tool to join with the initial true results.  

 

Both methods seem to work.  Your approach is more direct.  I am using it now.  Thanks.

hellyars
13 - Pulsar

No, there are always sub-directories and files in the path after the serial #. 

Labels