Hello,
I am trying to extract some data within a couple of string fields. I have used text to columns, however i am not sure if there is a better way to do this? I have two columns TEXT 1 and TEXT 2 these both contain codes within the hashtags. I would like to extract these codes so the outputs are like columns "PARSE TEXT 1" "PARSE TEXT 2". I would then like to extract from TEXT 1 the date into it's own column.
I would be very grateful for any ideas on the best way to use this - i did wonder whether RegEx would work?
Example below.
TEXT 1 | TEXT 2 | PARSE TEXT 1 | PARSE TEXT 2 | DATE EXTRACT TEXT 1 |
LH1234 - 12 Jun 2021 - #ABC# | notified customer #JKL# | ABC | JKL | 12 JUN 2021 |
LH1234 - 10 Jun 2021 - #DEF# | notified customer #MNO# | DEF | MNO | 10 JUN 2021 |
LH1234 - 01 Jun 2021 - #GHI# | notified customer #PQR# | GHO | PQR | 01 JUN 2021 |
Solved! Go to Solution.
You can do this using regEx 🙂
Please see below,
Attached the workflow,
Hope this helps!
Regards,
Hi @Lauren_Holmes,
You could use three Regex_Replace() functions to solve this, or use a regular Regex tool either method would work! I've gone for the former:
Formula 1: Extracting ABC
REGEX_Replace([TEXT 1], '.*?#(.*?)#.*', '$1')
Formula 2: Extracting JKL
REGEX_Replace([TEXT 2], '.*?#(.*?)#.*', '$1')
Formula 3: Extracting Date
TRIM(REGEX_Replace([TEXT 1], '.*?-(.*?)-.*', '$1'))
I've attached my workflow for you to download if needed!
Kind regards,
Jonathan
Your hunch is right, Regex can solve it. Here is one possible way.
Dawn.
Hi @KarolinaRoza do you have any suggestions say when a row has more than one code?
Currently it is keeping the last # and the first # for example output ABC# #DEF however i would ideally have below:
TEXT 1 | IDEAL OUTPUT |
LH1234 - 30 JUN 2021 - #ABC# #DEF# | ABC DEF |
LH1234 - 30 JUN 2021 - #ABC# #DEF# #GHI# | ABC DEF GHI |
You can use the Regex tool, set on tokenise (i.e. it picks up everything that matches a pattern) and input to rows, follow by summarize manipulation. Something like the below is one possible way.
Cheers,
Dawn.
In a formula tool you can use:
replace(REGEX_Replace([TEXT 1],".*?(\#.*\#).*","$1"),"#","")
That's a fairly different problem than the initial query.
Thank you so much :)!
My workflow oddly adds a comma in-between each code where as yours does not so I will need to investigate why that is!