Hi Community members from a wet and dull Oxfordshire 😀
I have a column with plenty of string text contained within it, and I'm trying to extract out the names. Now I really hoping to use the new Name Entity Recognition tool but it's not available yet, which I think may have helped.
Anyway I have created some dummy text by way of example and I wondered what would be the best approach where the number of names and the position changes throughout the text as well as the other text around those names. Do I need to get to grips with Regex 😯
Thank you
Justin
Solved! Go to Solution.
hey @Justin53Q
Greetings from a very wet and dull Newcastle!
Will your data be coming in from a docx file? Or is this just for examples sake?
Reading a docx into Alteryx may require some python/macro ability. Also, you're totally right, named recognition might be useful here, however im sure we can find a solution before the release of that tool!
Cheers,
TheOC
Hi @Justin53Q ,
With regex you are relying on a certain consistency within the string. That's not to say it's can't have an element of dynamism, for example in the text string you provided, all names are a single uppercase letter followed by a space followed by a string of letters, and this is why it's probably not a very good example as it's unlikely to be representative of your dataset.
If it is, the regex string would simply be something like:
(\u\s\u.*?\>)
Used in a tokenise function:
This will split to rows on each Name:
However, this is unlikely to be the case.
Hope this gets you off and running.
M.
Thank you @theOC
The string data is contained in one excel sheet. Apologies for not making that clear.
Justin
Thank you binuacs
Thank you mceleavey