Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

RegEx Tokenize Group of Words to Column

taylorbyers
6 - Meteoroid

I have textual data that are sentences contained in a single column. I am looking to shrink this data down into a new column with a maximum of 7 words. Some columns contain more less than 7 words and some contain more. I tried to use this regular expression, but RegEx returns a NULL Column if the column doesn't contain at least 7 words.

 

(\<\w+\> \<\w+\> \<\w+\> \<\w+\> \<\w+\> \<\w+\> \<\w+\>)

 

 

6 REPLIES 6
binuacs
20 - Arcturus

@taylorbyers Can you provide the input file and expected output?

taylorbyers
6 - Meteoroid

I cannot due to confidentiality. 

taylorbyers
6 - Meteoroid

I cannot due to confidentiality 

hellyars
13 - Pulsar

@binuacs 

 

I am not sure if I understood your question correctly, but I gave it a try...

Workflow attached.  I used two examples from the image you provided (one that produced a null and one that did not).

 

Steps:

 

1.  Assign a Record ID.  This is important when re-combining the data later.

2. I parsed the original text using (\w+) and set the Regex Tool to Tokenize and Split by Columns (=7).

3. I flipped the data using the Transpose tool.

4. I then used a Summary tool to recombine by the text grouped by Record ID using the Concat function.

 

 

hellyars_1-1665497596392.png

 

 

hellyars_0-1665497551253.png

 

taylorbyers
6 - Meteoroid

This works well, thank you!

Christina_H
14 - Magnetar

If you need it in just RegEx, try this

 

^((?:\<\w+\>\s?+){1,7})

 

It will parse up to 7 words from the start of the string.

Labels