Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Multi Row Name cleansing

spencer046
8 - Asteroid

Hello dear people,

 

I have the below dataset of different versions of company names for a particular location key group (Los Angeles_CA_90023). Currently I have "Revised Name 3" that replaces all the names in "Revised Name 2" column with the one with highest count. Now, while there are different company name versions of the "Miele" company, some of the company names are entirely different, and I want to replace the names based on either the highest count of their own name similarities (maybe based on the first word or two of the entire name), or leave it as it is (in case there is no potential similarity). My Desired outcome is shown on the right most column. Need help badly!

 

Location KeyRevised Name 1Revised Name 2Revised Name 3CountDesired outcome
Los Angeles_CA_90023Miele international companyMiele international companyMiele international company6466Miele international company
Los Angeles_CA_90023MIELE DEPENDABLE CENTERMIELE DEPENDABLE CENTERMiele international company18Miele international company
Los Angeles_CA_90023MIELE DEPENDABLE DISTRIMIELE DEPENDABLE DISTRIMiele international company16Miele international company
Los Angeles_CA_90023LOS ANGELES SERVICE CENTERLOS ANGELES SERVICE CENTERMiele international company11LOS ANGELES SERVICE CENTER
Los Angeles_CA_90023LOS ANGELES SERVICELOS ANGELES SERVICEMiele international company9LOS ANGELES SERVICE CENTER
Los Angeles_CA_90023DDC CALIFORNIADDC CALIFORNIAMiele international company4DDC CALIFORNIA
Los Angeles_CA_90023Dependable Distribution CenterDependable Distribution CenterMiele international company3Dependable Distribution Center
Los Angeles_CA_90023BAY CITIES C O BAY CITIES C O Miele international company2BAY CITIES C O 
Los Angeles_CA_90023BAY CITIES C O BARBARABAY CITIES C O BARBARAMiele international company1BAY CITIES C O 
Los Angeles_CA_90023MIELE DEPENDABLEMIELE DEPENDABLEMiele international company1Miele international company
Los Angeles_CA_90023MIELE DEPENDABLE LOGISTCSMIELE DEPENDABLE LOGISTCSMiele international company1Miele international company
3 REPLIES 3
AngelosPachis
16 - Nebula

Hello @spencer046 ,

 

Do you mind giving the following workflow a try. If I understand the logic correctly, you want to see if the names under Revised Name 3 have any similarity to those of Revised Name 2

 

If there is no similarity, keep Revised Name 2 as an output, else maintain Revised Name 3.

 

AngelosPachis_0-1610609615775.png

 

Hope that helps, let me know if it worked for you.

 

Regards,

 

Angelos

spencer046
8 - Asteroid

@AngelosPachis well actually I do not want the Revised Name 3 column, that was there to show what I have done so far versus what I need to do now for my expected output (Desired Output). The similarity index that you used, can it be implemented in Revised Name 2 column to find the similarity within the column? Based on that I need to do the grouping in Desired Output column. 

AngelosPachis
16 - Nebula

Oh ok, have you thought of using a Contains function @spencer046 ?

 

So if Revised Name 1 or 2(whatever is correct-they are the same) contains Miele, then return Miele international company

ELSEIF Revised name contains Los Angeles, then return LOS ANGELES SERVICE CENTER

ELSE Revised Name1

ENDIF

Labels