Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

RegEx Tokenize Group of Words to Column

taylorbyers
6 - Meteoroid

I have textual data that are sentences contained in a single column. I am looking to shrink this data down into a new column with a maximum of 7 words. Some columns contain more less than 7 words and some contain more. I tried to use this regular expression, but RegEx returns a NULL Column if the column doesn't contain at least 7 words.

 

(\<\w+\> \<\w+\> \<\w+\> \<\w+\> \<\w+\> \<\w+\> \<\w+\>)

 

 

6 REPLIES 6
binuacs
21 - Polaris

@taylorbyers Can you provide the input file and expected output?

taylorbyers
6 - Meteoroid

I cannot due to confidentiality. 

taylorbyers
6 - Meteoroid

I cannot due to confidentiality 

hellyars
13 - Pulsar

@binuacs 

 

I am not sure if I understood your question correctly, but I gave it a try...

Workflow attached.  I used two examples from the image you provided (one that produced a null and one that did not).

 

Steps:

 

1.  Assign a Record ID.  This is important when re-combining the data later.

2. I parsed the original text using (\w+) and set the Regex Tool to Tokenize and Split by Columns (=7).

3. I flipped the data using the Transpose tool.

4. I then used a Summary tool to recombine by the text grouped by Record ID using the Concat function.

 

 

hellyars_1-1665497596392.png

 

 

hellyars_0-1665497551253.png

 

taylorbyers
6 - Meteoroid

This works well, thank you!

Christina_H
14 - Magnetar

If you need it in just RegEx, try this

 

^((?:\<\w+\>\s?+){1,7})

 

It will parse up to 7 words from the start of the string.

Labels