Get Inspire insights from former attendees in our AMA discussion thread on Inspire Buzz. ACEs and other community members are on call all week to answer!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Text Parsing - Extracting key text

CiaranA
10 - Fireball

Hi all, 

 

I was wondering the best way to extract a Station name from a field where station name appears in quite different circumstances in each record.

 

One way that I tried to use the text to columns tool but my key text i wish to extract appears in varying columns. 

I could do some filtering, formulae to remove specific text like "STN", or "Station" sorting each filtered set of records and then union-ing together but....

 

Ideally i'd like to pull out any text before the word "Station" or "STN" from the one field if possible before using text to columns and have a clean set of Station names as records. 

 

Any help appreciated!

 

Ciaran

4 REPLIES 4
Bob_Blackey
11 - Bolide

@CiaranA

 

I would use the Regex tool and the Parse functionality.

 

the Regex would be:

 

(.+)(STATION|STN)(.*)

 

The first set of parens give you the Station Name

The second set give you either STATION or STN   (it's not case sensitive by default)

The third give you all the following text.

 

Cheers,

Bob

 

DanS
9 - Comet

Hey Ciaran,

 

Try this with the RegEx tool: 

.+(?=Station|STN)

With the output method as Tokenize, this should be able to split a clean list for you. 

 

Thanks,

Dan

nick_ceneviva
11 - Bolide

Based on the screenshot that you included, I think probably the best way to tackle this is to figure out where the first occurrence of either "Station" or "STN" occurs.  From there, you can use the substring function to get the string from the start to the occurrence of Station.  The formula would look something like this for "Station":

 

Substring([Stations],0,FindString([Stations], "Station"))

 

You could use a conditional statement to check for both Station or STN.  Let me know if you would like me to take a look at what that formula would look like or if you that explanation doesn't quite work.

 

NicoleJohnson
ACE Emeritus
ACE Emeritus

Slight modification to @Bob_Blackey RegEx expression that I think will accomplish your request:

 

(.*)\s(Station||STN),\s(.*)

 

This accounts for the space in between your station name & the word Station/STN, and then removes the comma & space following before providing the remaining text. So you'll get one column with station name, one with whichever was present in the original data Station/STN, and one with the remaining text.

 

Additional question: Are there any scenarios where there are instances of words following the Station/STN word that you want to include? I.e. something like "Glasgow Station East" where you would want to return "Glasgow East"? Just checking. Would require some tweaking if so. :)

 

NJ

Labels