Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

RegEx Tokenize Group of Words to Column

taylorbyers
6 - Meteoroid

I have textual data that are sentences contained in a single column. I am looking to shrink this data down into a new column with a maximum of 7 words. Some columns contain more less than 7 words and some contain more. I tried to use this regular expression, but RegEx returns a NULL Column if the column doesn't contain at least 7 words.

 

(\<\w+\> \<\w+\> \<\w+\> \<\w+\> \<\w+\> \<\w+\> \<\w+\>)

 

 

6 REPLIES 6
binuacs
21 - Polaris

@taylorbyers Can you provide the input file and expected output?

taylorbyers
6 - Meteoroid

I cannot due to confidentiality. 

taylorbyers
6 - Meteoroid

I cannot due to confidentiality 

hellyars
13 - Pulsar

@binuacs 

 

I am not sure if I understood your question correctly, but I gave it a try...

Workflow attached.  I used two examples from the image you provided (one that produced a null and one that did not).

 

Steps:

 

1.  Assign a Record ID.  This is important when re-combining the data later.

2. I parsed the original text using (\w+) and set the Regex Tool to Tokenize and Split by Columns (=7).

3. I flipped the data using the Transpose tool.

4. I then used a Summary tool to recombine by the text grouped by Record ID using the Concat function.

 

 

hellyars_1-1665497596392.png

 

 

hellyars_0-1665497551253.png

 

taylorbyers
6 - Meteoroid

This works well, thank you!

Christina_H
14 - Magnetar

If you need it in just RegEx, try this

 

^((?:\<\w+\>\s?+){1,7})

 

It will parse up to 7 words from the start of the string.

Labels
Top Solution Authors