Get Inspire insights from former attendees in our AMA discussion thread on Inspire Buzz. ACEs and other community members are on call all week to answer!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Freeform field -Need to clean and combine variations of same group name into one

bmcmanu1
5 - Atom

Hello, I have a data set that includes various names of individuals and businesses and I want to be able to get the name field cleaned up and group all the variations of the same name or business and remove the unnecessary characters and combine the name into 1 group and then summarize the number of transaction IDs I have for that individual. I have a lot of records with a lot of different names spelled in various ways so I would prefer not to have to do a contains calc for each name. Below I have included the before and after of what I am looking for.  Also I would then like to tie it back to the main data source to use it as the new Business name field.

 

Before (current Data):

Business Name                                       Transaction ID

Advanced MedBBB33
Advd Med (Local)TT22
Advanced Medicine (not Local)TP89
Ben Frankabc123
Benjamin Franklindef123
Benjamin Franklinghi23
Advanced Medicine (not Local)GGG22
Advanced Medicine (not Local)2223HH222
Advanced Medicine (not Local)II222

 

 

After(What I would like to see)

Business NameCount of transaction IDs
Advanced Medicine5
Benjamin Franklin3

 

4 REPLIES 4
ShankerV
17 - Castor

Hi @bmcmanu1 

 

One way of doing this is with the help of lookup table.

 

ShankerV_0-1676316066778.png

 

Many thanks

Shanker V

bmcmanu1
5 - Atom

What would you recommend though if you have over 5000 records with a lot more business names. I have tried the fuzzy match tool and grouping but I cannot get it to work properly, and I end of with still different variations of business names I would like to have combined.

ShankerV
17 - Castor

Hi @bmcmanu1 

 

I can still recommend the lookup table as the names are repeated many times and with limited lookup values you will be able to achieve the output.

 

As I still do not have the insight on your 5000 records, recommending the same lookup value based on the sample sahred.

 

Many thanks

Shanker V

 

alisonpitt
11 - Bolide

Hi @bmcmanu1 - this one is a challenge! I came up with something that will get you most of the way there, I think, but you will still have to use a little manual intervention. I'd recommend maybe adding in some steps in the middle to alert you if you get names that can't be categorized. Anyway, these are the basic steps I used:

- Regex to get rid of anything in parentheses and after

- A manual table of exceptions to handle things that can't be matched (in this case, the only one I just couldn't get to match was "Advd" - if you replace that with "Advanced" it matches)

- Fuzzy match by "Name w/Nicknames" (I'm sure you could tweak this for better matches, too)

- Replace original names with the matched names and count

 

I hope this helps!! Maybe someone with more Fuzzy Match experience could get you the rest of the way there.

Spoiler
fuzzy-match.png
Labels