Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

finding phrases in sentences

harend
8 - Asteroid

I am trying to count the number of times that a phrase occurs in a series of sentences. For example, I am trying to find how many times someone entered the phrase "you rock" or "you are terrible" into a free form text field. I've currently tried to use the fuzzy matching tool and tried to count the number of times that phrase appears however it doesn't seem to work with phrases that have multiple words. Another note is that I originally used the fuzzy match tool because spelling was an issue in the searched (people entered) field. Any ideas on this? I'm able to get it to return the correct numbers if there is only one word that is being searched for however that won't work for this because the negations (not great vs great) are proving to be more challenging that I had originally thought.

 

Any feedback or questions are more than welcome! It's a little hard due to the sensitive nature of the info.

8 REPLIES 8
MarqueeCrew
20 - Arcturus
20 - Arcturus
Will you be at inspire next week? I could work with you on this there.
Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
RodL
Alteryx Alumni (Retired)

Yes, the use cases I've seen for fuzzy matching is more pertinent to trying to find duplicate names in data to consolidate into one same name, not finding phrases interspersed within a text string.

 

A starting point if you are interested in getting "exact matches" to a phrase would be to use the Find Replace tool. 

You bring your data with the string field into the 'F' side of the tool and you can create a list of search phrases in another file (or even a Text Input tool for starters), and then configure it to find "Any Part of the Field". Select the radio button to "Append Field(s) to Record" and check your search phrase column name. 

It will provide you with your records and an appended field with the phrase it matched. You could then use a Summarize tool to group and count the occurrences. 

It will only find exact matches, but I could see where you could end up developing a list of "close matches" that map to one phrase, use the Find Replace tool to first actually replace, and then do the above process to count. 

dataMack
12 - Quasar

The approach outlined by @RodL is a good one if you are looking to exactly match very specific phrases.  The only thing I would add to that approach would be to use a formula tool and take everything to lowercase first to improve the chances for a match.

 

Based on the problem you are describing, it sounds like what you may really be trying to do is text analytics- things like sentiment analysis to determine if a comment was positive or negative or identfy key phrases.  If this is really what you are after then you can use something like the Azure ML Text analytics API, which you can use Alteryx to call and send you data for scoring.

https://azure.microsoft.com/en-us/documentation/articles/machine-learning-apps-text-analytics/

 

harend
8 - Asteroid

I will be at inspire however I need this analysis to be finished by the beginning of tomorrow. I'd be happy to bring my solution along though and show you!

harend
8 - Asteroid

Per the suggestion of another Alteryx Employee, I think the best bet is to split all the text into columns, with some sort of identifier to show which comments go together, then fuzzy match against a large list of english words. That should produce a 'spell checked' version of the feedback and then I can do the find and replace tool to find the phrases we are looking for. 

MarqueeCrew
20 - Arcturus
20 - Arcturus
I might try soundex and look for groupings of values.
Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
jgo
Alteryx Alumni (Retired)

The function REGEX_CountMatches() may be a solution.

 

REGEX_CountMatches([Field1], 'you rock') would return however many times "you rock" was found. CASE is insenstive by default, but can be adjusted by adding in the optional argument.

harend
8 - Asteroid
@lauras I met with @marqueecrew and we were able to come up with a solution. Thanks everyone for your help! It took a combination of everyone's suggestions!
Labels