Regex FileName with different naming conventions
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi all,
I have different files with naming as follows:
Real estates [Data centre] [SGD, RM]
Real estates [Office] (SGD) reduce ROU
Real estates [Properties] [Part 1 (SOA)] [SGD]
Real estates [Properties] [Part 2 (1541)] [SGD]
Real estates [Properties] [Part 3] [SGD]
I would like to seek your help on extracting the following outputs in this desired manner:
Real estates | Data centre | [SGD, RM] |
Real estates | Office | (SGD) reduce ROU |
Real estates | Properties | [Part 1 (SOA)] [SGD] |
Real estates | Properties | [Part 2 (1541)] [SGD] |
Real estates | Properties | [Part 3] [SGD] |
My current Regular Expression (\<\w+\s+\w+\>).(\<\w+\s+\w+\>)*(\[.*\]).*(\[.*\]) is having an issue.
And my formula is Replace(Replace([RegExOut1], "[", ""),"]","")
My current results are as follows:
Real estates | Data centre | [SGD, RM] |
Real estates | Properties Part 1 (SOA) | [SGD] |
Real estates | Properties Part 2 (1541) | [SGD] |
Real estates | Properties Part 3 | [SGD] |
Thank you in advance for your help!
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hey @zekharya, will this always be the structure of your data? If so you could perhaps look to use something like this:
([^[]+)\s\[([^]]+)\]\s(.+)
We're basically extracting sections of the string here based upon the square brackets.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @DataNath, thank you for the super fast response!
However, I made a mistake earlier as the workflow uses the full path instead of file name. Could you assist in providing the same result but with the source as below? Or is there a method in alteryx that allows me to just extract the filename aka just the part in bold first before I can use your given solution as above? Thank you so so much.
ie
C:\Users\zgoh022\Downloads\Brand\Alteryx Attempt\Real estates [Data centre] [SGD, RM].xlsx|||'#1 Korea$'
C:\Users\zgoh022\Downloads\Brand\Alteryx Attempt\Real estates [Office] (SGD) reduce ROU.xlsx|||'#3 Amazon$'
C:\Users\zgoh022\Downloads\Brand\Alteryx Attempt\Real estates [Properties] [Part 1 (SOA)] [SGD].xlsx|||'#4 Japan$'
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hey @zekharya, no problem. The RegEx is a simple change, just a slight adaptation to allow for extra bits (i.e. the rest of the filepath) either side of the target pattern(s):
.+\\([^[]+)\s\[([^]]+)\]\s(.+)\.
Where is this field coming from? If it's the directory tool then that will also output a [FileName] field rather than using the [FullPath].
Quick edit: Noticed you had bolded the file extension (.xlsx) so if you want to keep that, you could change the expression to something like this;
.+\\([^[]+)\s\[([^]]+)\]\s([^|]+)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @DataNath , thank you so much for your help. The regular expression you have kindly provided works perfectly!
.+\\([^[]+)\s\[([^]]+)\]\s(.+)\.
However, is it possible to include the final part in bold as one of the outputs as well?
C:\Users\zgoh022\Downloads\Brands\Alteryx Attempt\Real estates [Data centre] [SGD, RM].xlsx|||'#1 KL DC$'
So that the output is as follows:
Real estates | Data centre | [SGD, RM] | #1 KL DC |
Was tidying the workflow and realised this issue ><
Thanks once again!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Sorry for the late reply, it works, thank you so much for your help!
