Hello,
I have a dataset which looks something like:
xyz - Project Earth - some info - some more
xyz - Project Sun - some info
xyz : Project Hello ; some info
and this is all written in one line.
I would like to extract Project and the first word after that (as Project Earth; Project Sun) and have them in the same column. The delimiter after and before Project and it's name is not always a dash!
Thank you!
@insomned I think using Tokenize in the RegEx tool is exactly what you need here! Given that all your text is in one line (i.e. one cell):
Before:
After: (I put a semi colon follwed by a space in between each of the Project Xs. You use whatever delimeter suits you.
Explaining the steps:
Identify the pattern you want to extract. I chose (Project\s+\w+). This means find Project, followed by space, and the next word after it.
So it will go along the string, find each occurence of this, and split each occurence into Rows (as highlighted in the picture below).
After this, we bring on a Summarize tool and we will merge the rows together using Concatenate. We must specify the delimeter that will seperate each of the rows when they are joined:
You can achieve this easily by using regex tool in Tokenize mode as shown below.
Let me know if it works as you want. 🙂
This is an article with more details about this tool: https://community.alteryx.com/t5/Alteryx-Designer-Knowledge-Base/Tool-Mastery-RegEx/ta-p/37689
Hi Emmanuel,
Thanks a lot for your help!
Would it actually be possible to extract two strings after "Project" since sometimes Project has a name which is longer than 1 word.
Thanks a lot!
Yes absolutely !
You can achieve just by using text to columns tool as shown below. You can specify all the separators you have in your string (as I did in the first screenshoot) and the rest of process will be automatic. 🙂
Let us know if it works as you want please.
Hi @insomned
Here is one way to do Parse 2 or more words.
This will work even if we have 1 word after the Project or 2 words.
Input was:
Many thanks
Shanker V
hello ShankerV,
This seems to work for me, however, instead of splitting based on the initial characters as you have done, can it be coded to start with a specific word say Project and end at when the next comma is in the sentence for eg -
If its 25% Project Alpha, 50% Project Alpha Bravo Charlie, 25% Project Delta Echo - it should return the following -
Project Alpha
Project Alpha Bravo Charlie
Project Delta Echo
Would appreciate if you could assist.
hello @ShankerV ,
This seems to work for me, however, instead of splitting based on the initial characters as you have done, can it be coded to start with a specific word say Project and end at when the next comma is in the sentence for eg -
If its 25% Project Alpha, 50% Project Alpha Bravo Charlie, 25% Project Delta Echo - it should return the following -
Project Alpha
Project Alpha Bravo Charlie
Project Delta Echo
Would appreciate if you could assist.
User | Count |
---|---|
107 | |
82 | |
70 | |
54 | |
40 |