This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
I have textual data that are sentences contained in a single column. I am looking to shrink this data down into a new column with a maximum of 7 words. Some columns contain more less than 7 words and some contain more. I tried to use this regular expression, but RegEx returns a NULL Column if the column doesn't contain at least 7 words.
(\<\w+\> \<\w+\> \<\w+\> \<\w+\> \<\w+\> \<\w+\> \<\w+\>)
Solved! Go to Solution.
@taylorbyers Can you provide the input file and expected output?
I cannot due to confidentiality.
I cannot due to confidentiality
I am not sure if I understood your question correctly, but I gave it a try...
Workflow attached. I used two examples from the image you provided (one that produced a null and one that did not).
Steps:
1. Assign a Record ID. This is important when re-combining the data later.
2. I parsed the original text using (\w+) and set the Regex Tool to Tokenize and Split by Columns (=7).
3. I flipped the data using the Transpose tool.
4. I then used a Summary tool to recombine by the text grouped by Record ID using the Concat function.
This works well, thank you!
If you need it in just RegEx, try this
^((?:\<\w+\>\s?+){1,7})
It will parse up to 7 words from the start of the string.