Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Extracting data within a string field split into different columns

Lauren_Holmes
8 - Asteroid

Hello,

I am trying to extract some data within a couple of string fields. I have used text to columns, however i am not sure if there is a better way to do this? I have two columns TEXT 1 and TEXT 2 these both contain codes within the hashtags. I would like to extract these codes so the outputs are like columns "PARSE TEXT 1" "PARSE TEXT 2". I would then like to extract from TEXT 1 the date into it's own column.

 

I would be very grateful for any ideas on the best way to use this - i did wonder whether RegEx would work?

 

Example below.

 

TEXT 1TEXT 2PARSE TEXT 1PARSE TEXT 2DATE EXTRACT TEXT 1
LH1234 - 12 Jun 2021 - #ABC#notified customer #JKL#ABCJKL12 JUN 2021
LH1234 - 10 Jun 2021 - #DEF#notified customer #MNO#DEFMNO10 JUN 2021
LH1234 - 01 Jun 2021 - #GHI#notified customer #PQR#GHOPQR01 JUN 2021
11 REPLIES 11
messi007
15 - Aurora
15 - Aurora

@Lauren_Holmes,

 

You can do this using regEx 🙂

Please see below,

messi007_0-1625063299830.png

Attached the workflow,

 

Hope this helps!

 

Regards,

apathetichell
19 - Altair

I kept in the "#"s in the original - here's a corrected version.

Jonathan-Sherman
15 - Aurora
15 - Aurora

Hi @Lauren_Holmes,

 

You could use three Regex_Replace() functions to solve this, or use a regular Regex tool either method would work! I've gone for the former:

 

Formula 1: Extracting ABC

REGEX_Replace([TEXT 1], '.*?#(.*?)#.*', '$1')

 

Formula 2: Extracting JKL

REGEX_Replace([TEXT 2], '.*?#(.*?)#.*', '$1')

 

Formula 3: Extracting Date

TRIM(REGEX_Replace([TEXT 1], '.*?-(.*?)-.*', '$1'))

 

I've attached my workflow for you to download if needed!

 

Kind regards,

Jonathan

DawnDuong
13 - Pulsar
13 - Pulsar

hi @Lauren_Holmes 

Your hunch is right, Regex can solve it. Here is one possible way.
Dawn.

KarolinaRoza
11 - Bolide

hi @Lauren_Holmes 

 

You can use Regex.

Please find my solution:

 

KarolinaRoza_0-1625063698252.png

 

 

cheers,

Karolina

Lauren_Holmes
8 - Asteroid

Hi @KarolinaRoza do you have any suggestions say when a row has more than one code?

Currently it is keeping the last # and the first # for example output ABC# #DEF however i would ideally have below:

 

TEXT 1IDEAL OUTPUT
LH1234 - 30 JUN 2021 - #ABC# #DEF#ABC DEF
LH1234 - 30 JUN 2021 - #ABC# #DEF# #GHI#ABC DEF GHI
DawnDuong
13 - Pulsar
13 - Pulsar

hi @Lauren_Holmes 

You can use the Regex tool, set on tokenise (i.e. it picks up everything that matches a pattern) and input to rows, follow by summarize manipulation. Something like the below is one possible way.

Cheers,
Dawn.

RegexTokenise.PNG

 

apathetichell
19 - Altair

In a formula tool you can use:

replace(REGEX_Replace([TEXT 1],".*?(\#.*\#).*","$1"),"#","")

 

That's a fairly different problem than the initial query.

Lauren_Holmes
8 - Asteroid

Thank you so much :)!

 

My workflow oddly adds a comma in-between each code where as yours does not so I will need to investigate why that is! 

Labels