Parsing a delimited string with RegEx
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hello everyone,
While doing self practice on UseCae-1 from AlexKo's Tool Mastery | Cross Tab, I did parsed the delimited string using a RegEx, although my RegEx produced an extra column holding the parsed delimiter with spaces but the expression worked fine and just needed to delete the extra column with Select Tool.
Sample of the input data
As you can see from the snapshot that the (-) exist as delimiter and also part of the name, so I wrote the below RegEx
(\u*)((?:\s[-]\s)|(?:[-]\s))(.*)
My question is what is a cleaner way to write the expression? I am new to RegEx and hope to get more experience to use this powerful tool.
Thanks for the help.
AliAS
Solved! Go to Solution.
- Labels:
- Regex
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @AliAS2020
Change your expression to (\u*)\s*-\s*(.*). The start and end capture groups are the same as yours. The middle part in green matches zero or spaces followed by a hyphen followed by zero or spaces. There are no brackets around the middle, so it won't be captured as a column, removing the need for the extra Select tool to remove it.
Dan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @danilang
Thanks that helped a lot, I did something similar but it didn't work since I kept the brackets, now this is a cleaner version and no need to the extra select tool.
Stay safe
AliAS
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @AliAS2020,
There is a cleaner way to parse the data using RegEx for this example. Try using a RegEx parse with the formula (\u*)\s*-\s+(.*)
The delimiter is a dash followed by a space, but sometimes preceded by a space, which is the \s*-\s+ where:
\s* zero or more spaces
- dash (not required to be in brackets as it is not a special character)
\s+ one or more spaces
By not including the delimiter in a marked groups it avoids the need to data cleanse it out later. I have attached how I would solve this (in Container 98). Note changes to default field settings in the Multi-Row Formula and Cross Tab tools and the name change to the Output root name in the Text to Columns
