I have textual data that are sentences contained in a single column. I am looking to shrink this data down into a new column with a maximum of 7 words. Some columns contain more less than 7 words and some contain more. I tried to use this regular expression, but RegEx returns a NULL Column if the column doesn't contain at least 7 words.
(\<\w+\> \<\w+\> \<\w+\> \<\w+\> \<\w+\> \<\w+\> \<\w+\>)
Solved! Go to Solution.
@taylorbyers Can you provide the input file and expected output?
I cannot due to confidentiality.
I cannot due to confidentiality
I am not sure if I understood your question correctly, but I gave it a try...
Workflow attached. I used two examples from the image you provided (one that produced a null and one that did not).
Steps:
1. Assign a Record ID. This is important when re-combining the data later.
2. I parsed the original text using (\w+) and set the Regex Tool to Tokenize and Split by Columns (=7).
3. I flipped the data using the Transpose tool.
4. I then used a Summary tool to recombine by the text grouped by Record ID using the Concat function.
This works well, thank you!
If you need it in just RegEx, try this
^((?:\<\w+\>\s?+){1,7})
It will parse up to 7 words from the start of the string.