Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Extract A Particular Word with the Previous One

insomned
8 - Asteroid

Hello, 

 

I have a dataset which looks something like:

 

xyz - Project Earth - some info - some more

xyz - Project Sun - some info

xyz : Project Hello ; some info

 

and this is all written in one line.

 

I would like to extract Project and the first word after that (as Project Earth; Project Sun) and have them in the same column. The delimiter after and before Project and it's name is not always a dash!

 

Thank you! 

7 REPLIES 7
ShankerV
17 - Castor

Hi @insomned 

 

Please find the expected output.

 

ShankerV_0-1675413686883.png

 

 

Many thanks

Shanker V

BS_THE_ANALYST
13 - Pulsar

@insomned I think using Tokenize in the RegEx tool is exactly what you need here! Given that all your text is in one line (i.e. one cell):
Before:

BS_THE_ANALYST_0-1675415985973.png

After: (I put a semi colon follwed by a space in between each of the Project Xs. You use whatever delimeter suits you. 

BS_THE_ANALYST_1-1675416022694.png

 

Explaining the steps:
Identify the pattern you want to extract. I chose (Project\s+\w+). This means find Project, followed by space, and the next word after it. 
So it will go along the string, find each occurence of this, and split each occurence into Rows (as highlighted in the picture below).

BS_THE_ANALYST_2-1675416254497.png

 

After this, we bring on a Summarize tool and we will merge the rows together using Concatenate. We must specify the delimeter that will seperate each of the rows when they are joined:

BS_THE_ANALYST_3-1675416346054.png

 

 

Emmanuel_G
13 - Pulsar

@insomned 

 

You can achieve this easily by using regex tool in Tokenize mode as shown below.

 

Let me know if it works as you want. 🙂

 

This is an article with more details about this tool: https://community.alteryx.com/t5/Alteryx-Designer-Knowledge-Base/Tool-Mastery-RegEx/ta-p/37689

 

Emmanuel_G_0-1675417238299.png

 

insomned
8 - Asteroid

Hi Emmanuel, 

 

Thanks a lot for your help! 

 

Would it actually be possible to extract two strings after "Project" since sometimes Project has a name which is longer than 1 word. 

 

Thanks a lot!

Emmanuel_G
13 - Pulsar

@insomned 

 

Yes absolutely !

 

You can achieve just by using text to columns tool as shown below. You can specify all the separators you have in your string (as I did in the first screenshoot) and the rest of process will be automatic. 🙂

 

Let us know if it works as you want please.

Emmanuel_G_0-1675430156026.pngEmmanuel_G_1-1675430168521.png

 

BS_THE_ANALYST
13 - Pulsar

@insomned You can still use RegEx tokenize. You just need to add an optional non-capturing group to the regular expression:

BS_THE_ANALYST_0-1675431128075.png

 

ShankerV
17 - Castor

Hi @insomned 

 

Here is one way to do Parse 2 or more words.

 

This will work even if we have 1 word after the Project or 2 words.

 

ShankerV_0-1675431383424.png

 

Input was:

ShankerV_1-1675431397939.png

 

Many thanks

Shanker V

 

Labels