Alteryx Designer Desktop Discussions

hellyars · ‎05-25-2022

I have a Title field. I want to count the occurrence of each word across thousands of titles.

Step 1 : Text to Columns tool set to Split to Rows using the \s delimiter

Step 2: Summary tool set to Group By and Count

The challenge I am running into is that some titles are all Upper Case, some are Title Case, and some contain acronyms that are Upper Case and need to be preserved. As currently configured, Group By would treat RED and Red as different words. Can you change Group By to ignore capitalization?

TITLE_FIELD

Red Storm Rising

RED STORM Rising

Red Storm Rising (RSR)

IraWatt · ‎05-25-2022

Hey @hellyars,

Before the summarize use the data cleaning tool

You can change the case, remove punctuation ect. Data Cleansing Tool | Alteryx Help

Any questions or issues please ask :)
HTH!
Ira

DataNath · ‎05-25-2022

I think regardless of the method, it's unlikely that you'll be able to maintain the formatting if you want to perform a count. Afaik, Alteryx groups them exactly as they appear.

It's a pretty long shot, but do you have a list of titles in their simple form that you could use as a lookup? If so, you could match and use those for the count. Something like this:

hellyars · ‎05-25-2022

@IraWatt I know. I am trying to avoid the complication of the Acronyms that need to be preserved. I can extract those using Regex, but I was hoping there was a simple on/off/ignore feature/behavior of the Summary tool's Group By function.

...and sometimes that Upper Case has to be preserved because it denotes meaning

...also I am not really working with book titles @DataNath

rfoster7 · ‎05-25-2022

So, as I know you aren't actually grouping book titles, I would recommend looking into using the fuzzy match tool to create a "field to group by" field for close matches. Then you can group by that field to get your count. Open up the fuzzy match tool example and look at the third one down. It might meet your needs.

Alteryx Designer Desktop Discussions

Can You Set Group By to Ignore Case