Alteryx Designer Desktop Discussions

zekharya · ‎11-21-2022

Hi all,

I have different files with naming as follows:

Real estates [Data centre] [SGD, RM]

Real estates [Office] (SGD) reduce ROU

Real estates [Properties] [Part 1 (SOA)] [SGD]

Real estates [Properties] [Part 2 (1541)] [SGD]

Real estates [Properties] [Part 3] [SGD]

I would like to seek your help on extracting the following outputs in this desired manner:

Real estates	Data centre	[SGD, RM]
Real estates	Office	(SGD) reduce ROU
Real estates	Properties	[Part 1 (SOA)] [SGD]
Real estates	Properties	[Part 2 (1541)] [SGD]
Real estates	Properties	[Part 3] [SGD]

My current Regular Expression (\<\w+\s+\w+\>).(\<\w+\s+\w+\>)*(\[.*\]).*(\[.*\]) is having an issue.

And my formula is Replace(Replace([RegExOut1], "[", ""),"]","")

My current results are as follows:

Real estates	Data centre	[SGD, RM]

Real estates	Properties Part 1 (SOA)	[SGD]
Real estates	Properties Part 2 (1541)	[SGD]
Real estates	Properties Part 3	[SGD]

Thank you in advance for your help!

DataNath · ‎11-21-2022

Hey @zekharya, will this always be the structure of your data? If so you could perhaps look to use something like this:

([^[]+)\s\[([^]]+)\]\s(.+)

We're basically extracting sections of the string here based upon the square brackets.

zekharya · ‎11-21-2022

Hi @DataNath, thank you for the super fast response!

However, I made a mistake earlier as the workflow uses the full path instead of file name. Could you assist in providing the same result but with the source as below? Or is there a method in alteryx that allows me to just extract the filename aka just the part in bold first before I can use your given solution as above? Thank you so so much.

ie

C:\Users\zgoh022\Downloads\Brand\Alteryx Attempt\Real estates [Data centre] [SGD, RM].xlsx|||'#1 Korea$'

C:\Users\zgoh022\Downloads\Brand\Alteryx Attempt\Real estates [Office] (SGD) reduce ROU.xlsx|||'#3 Amazon$'

C:\Users\zgoh022\Downloads\Brand\Alteryx Attempt\Real estates [Properties] [Part 1 (SOA)] [SGD].xlsx|||'#4 Japan$'

DataNath · ‎11-21-2022

Hey @zekharya, no problem. The RegEx is a simple change, just a slight adaptation to allow for extra bits (i.e. the rest of the filepath) either side of the target pattern(s):

.+\\([^[]+)\s\[([^]]+)\]\s(.+)\.

Where is this field coming from? If it's the directory tool then that will also output a [FileName] field rather than using the [FullPath].

Quick edit: Noticed you had bolded the file extension (.xlsx) so if you want to keep that, you could change the expression to something like this;

.+\\([^[]+)\s\[([^]]+)\]\s([^|]+)

zekharya · ‎11-23-2022

Hi @DataNath , thank you so much for your help. The regular expression you have kindly provided works perfectly!

.+\\([^[]+)\s\[([^]]+)\]\s(.+)\.

However, is it possible to include the final part in bold as one of the outputs as well?

C:\Users\zgoh022\Downloads\Brands\Alteryx Attempt\Real estates [Data centre] [SGD, RM].xlsx|||'#1 KL DC$'

So that the output is as follows:

Real estates

Data centre

[SGD, RM]

#1 KL DC

Was tidying the workflow and realised this issue ><

Thanks once again!

DataNath · ‎11-23-2022

Hey @zekharya, no problem! Try the following:

.+\\([^[]+)\s\[([^]]+)\]\s(.+)\.xlsx\|{3}'(.+)\$

zekharya · ‎12-11-2022

Sorry for the late reply, it works, thank you so much for your help!

Alteryx Designer Desktop Discussions

Regex FileName with different naming conventions

Re: Unable to get an output

Re: Extracting the list of sheet names across mult...

Re: Firm names parse

Re: Help with Multi-Row formula

Re: Assign Random data to Executive with limited p...