Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Regex FileName with different naming conventions

zekharya
6 - Meteoroid

Hi all,

 

I have different files with naming as follows:

Real estates [Data centre] [SGD, RM]

Real estates [Office] (SGD) reduce ROU

Real estates [Properties] [Part 1 (SOA)] [SGD]

Real estates [Properties] [Part 2 (1541)] [SGD]

Real estates [Properties] [Part 3] [SGD]

 

I would like to seek your help on extracting the following outputs in this desired manner:

Real estatesData centre[SGD, RM]
Real estatesOffice(SGD) reduce ROU
Real estatesProperties[Part 1 (SOA)] [SGD]
Real estatesProperties[Part 2 (1541)] [SGD]
Real estatesProperties[Part 3] [SGD]

 

My current Regular Expression (\<\w+\s+\w+\>).(\<\w+\s+\w+\>)*(\[.*\]).*(\[.*\]) is having an issue.

And my formula is Replace(Replace([RegExOut1], "[", ""),"]","")

My current results are as follows:

Real estatesData centre[SGD, RM]
   
Real estatesProperties Part 1 (SOA)[SGD]
Real estatesProperties Part 2 (1541)[SGD]
Real estatesProperties Part 3[SGD]

 

Thank you in advance for your help!

6 REPLIES 6
DataNath
17 - Castor

Hey @zekharya, will this always be the structure of your data? If so you could perhaps look to use something like this:

 

([^[]+)\s\[([^]]+)\]\s(.+)

 

DataNath_0-1669023885261.png

 

We're basically extracting sections of the string here based upon the square brackets.

zekharya
6 - Meteoroid

Hi @DataNath, thank you for the super fast response!

 

However, I made a mistake earlier as the workflow uses the full path instead of file name. Could you assist in providing the same result but with the source as below? Or is there a method in alteryx that allows me to just extract the filename aka just the part in bold first before I can use your given solution as above? Thank you so so much.

 

ie

C:\Users\zgoh022\Downloads\Brand\Alteryx Attempt\Real estates [Data centre] [SGD, RM].xlsx|||'#1 Korea$'

C:\Users\zgoh022\Downloads\Brand\Alteryx Attempt\Real estates [Office] (SGD) reduce ROU.xlsx|||'#3 Amazon$'

C:\Users\zgoh022\Downloads\Brand\Alteryx Attempt\Real estates [Properties] [Part 1 (SOA)] [SGD].xlsx|||'#4 Japan$'

DataNath
17 - Castor

Hey @zekharya, no problem. The RegEx is a simple change, just a slight adaptation to allow for extra bits (i.e. the rest of the filepath) either side of the target pattern(s):

DataNath_0-1669025226527.png

 

.+\\([^[]+)\s\[([^]]+)\]\s(.+)\.

 

Where is this field coming from? If it's the directory tool then that will also output a [FileName] field rather than using the [FullPath].

 

Quick edit: Noticed you had bolded the file extension (.xlsx) so if you want to keep that, you could change the expression to something like this;

 

DataNath_1-1669025388159.png

 

.+\\([^[]+)\s\[([^]]+)\]\s([^|]+)

 

zekharya
6 - Meteoroid

Hi @DataNath , thank you so much for your help. The regular expression you have kindly provided works perfectly!

.+\\([^[]+)\s\[([^]]+)\]\s(.+)\.

 

However, is it possible to include the final part in bold as one of the outputs as well?

C:\Users\zgoh022\Downloads\Brands\Alteryx Attempt\Real estates [Data centre] [SGD, RM].xlsx|||'#1 KL DC$'

 

So that the output is as follows:

Real estatesData centre [SGD, RM]#1 KL DC

 

Was tidying the workflow and realised this issue ><

Thanks once again!

DataNath
17 - Castor

Hey @zekharya, no problem! Try the following:

 

.+\\([^[]+)\s\[([^]]+)\]\s(.+)\.xlsx\|{3}'(.+)\$

zekharya
6 - Meteoroid

Sorry for the late reply, it works, thank you so much for your help!

Labels