Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Searching For Multiple Keywords Within Rows of Text

nspare
6 - Meteoroid

I have a dataset with ~300k rows that has a "Job Description" field which contains free form text entries describing construction projects. I need to parse the whole database to find observations that contain references to Ceiling, Drywall, Insulation, Hardware, Flooring, and Concrete. I need to use wildcards to allow for misspellings. I've figured out how to make it do what I need it to do for ONE keyword, but I don't know enough about RegEx to know how to make it use a logical OR to look for more than one keyword at a time per cell.

 

Here's what I've got. Each one of these lines is using the formula tool and creating a new output column:

 

 

Contains(('ceiling'), [Text])

iif(REGEX_Match([Text], '.*ceil.*'),'Ceiling'," ")

REGEX_CountMatches([Text],'Ceiling')

 

How can I modify this to search for more than one term at a time? Thanks in advance!

 

8 REPLIES 8
LordNeilLord
15 - Aurora

Hi,

 

For your Contains formula you can use Contains('ceiling', [Text]) OR Contains('drywall', [Text]) OR Contains('insulation', [Text]) etc..

 

For the RegEx formulas you can create a capturing group for example: REGEX_CountMatches([Product Name], "(Xerox|&|Black)")

andre347
10 - Fireball

What about using a lookup table and the Find Replace tool? This means you only have to update the lookup table if you want to look for new words rather than having to maintain a formula.

 

I mocked up a very simple solution (v11.7 attached).


Screen Shot 2017-12-01 at 11.07.33.png

 

nspare
6 - Meteoroid

I like this solution, because it's the most reusable. The only problem is that sometimes the text entries might have multiple keywords, and this solution only returns the first keyword in the text string. I would need to either have a count column that is updated each time a keyword is found, or I would need to have each additional keyword generate a new column to show secondary or tertiary segments. Any thoughts? Thanks so much for your help! 

JoeS
Alteryx Alumni (Retired)

If you split your data to different rows based upon a space you can then use the find and replace tool for each word. 

 

Adding a record ID allows you to join back or group by depending on your final output needed. I have had a bit of a guess at what you may want.

01-12-2017 16-05-49.png

 

nspare
6 - Meteoroid

Thank you, but I think the text fields that I'm analyzing are too long to separate all of them into different rows after each space. It would create an unwieldy amount of rows. I will definitely use this in the future, though! 

JoeS
Alteryx Alumni (Retired)

I think that should be OK, Alteryx can cope with a lot of records (I have seen processes in the billions, Ned has also Tweeted that he processed a Trillion before as well here)

 

A lot of it will depend on the spec of your machine though.

nspare
6 - Meteoroid

Thanks for your help, now I know!

nspare
6 - Meteoroid

Oh wow, that is super useful. My work machine is grossly underpowered - but I will give it a try and see how long it takes!

Labels