I have a Title field. I want to count the occurrence of each word across thousands of titles.
Step 1 : Text to Columns tool set to Split to Rows using the \s delimiter
Step 2: Summary tool set to Group By and Count
The challenge I am running into is that some titles are all Upper Case, some are Title Case, and some contain acronyms that are Upper Case and need to be preserved. As currently configured, Group By would treat RED and Red as different words. Can you change Group By to ignore capitalization?
TITLE_FIELD |
Red Storm Rising |
RED STORM Rising |
Red Storm Rising (RSR) |
Hey @hellyars,
Before the summarize use the data cleaning tool
You can change the case, remove punctuation ect. Data Cleansing Tool | Alteryx Help
Any questions or issues please ask :)
HTH!
Ira
I think regardless of the method, it's unlikely that you'll be able to maintain the formatting if you want to perform a count. Afaik, Alteryx groups them exactly as they appear.
It's a pretty long shot, but do you have a list of titles in their simple form that you could use as a lookup? If so, you could match and use those for the count. Something like this:
@IraWatt I know. I am trying to avoid the complication of the Acronyms that need to be preserved. I can extract those using Regex, but I was hoping there was a simple on/off/ignore feature/behavior of the Summary tool's Group By function.
...and sometimes that Upper Case has to be preserved because it denotes meaning
...also I am not really working with book titles @DataNath
So, as I know you aren't actually grouping book titles, I would recommend looking into using the fuzzy match tool to create a "field to group by" field for close matches. Then you can group by that field to get your count. Open up the fuzzy match tool example and look at the third one down. It might meet your needs.