Alteryx Designer Desktop Discussions

spencer046 · ‎01-13-2021

Hello dear people,

I have the below dataset of different versions of company names for a particular location key group (Los Angeles_CA_90023). Currently I have "Revised Name 3" that replaces all the names in "Revised Name 2" column with the one with highest count. Now, while there are different company name versions of the "Miele" company, some of the company names are entirely different, and I want to replace the names based on either the highest count of their own name similarities (maybe based on the first word or two of the entire name), or leave it as it is (in case there is no potential similarity). My Desired outcome is shown on the right most column. Need help badly!

Location Key	Revised Name 1	Revised Name 2	Revised Name 3	Count	Desired outcome
Los Angeles_CA_90023	Miele international company	Miele international company	Miele international company	6466	Miele international company
Los Angeles_CA_90023	MIELE DEPENDABLE CENTER	MIELE DEPENDABLE CENTER	Miele international company	18	Miele international company
Los Angeles_CA_90023	MIELE DEPENDABLE DISTRI	MIELE DEPENDABLE DISTRI	Miele international company	16	Miele international company
Los Angeles_CA_90023	LOS ANGELES SERVICE CENTER	LOS ANGELES SERVICE CENTER	Miele international company	11	LOS ANGELES SERVICE CENTER
Los Angeles_CA_90023	LOS ANGELES SERVICE	LOS ANGELES SERVICE	Miele international company	9	LOS ANGELES SERVICE CENTER
Los Angeles_CA_90023	DDC CALIFORNIA	DDC CALIFORNIA	Miele international company	4	DDC CALIFORNIA
Los Angeles_CA_90023	Dependable Distribution Center	Dependable Distribution Center	Miele international company	3	Dependable Distribution Center
Los Angeles_CA_90023	BAY CITIES C O	BAY CITIES C O	Miele international company	2	BAY CITIES C O
Los Angeles_CA_90023	BAY CITIES C O BARBARA	BAY CITIES C O BARBARA	Miele international company	1	BAY CITIES C O
Los Angeles_CA_90023	MIELE DEPENDABLE	MIELE DEPENDABLE	Miele international company	1	Miele international company
Los Angeles_CA_90023	MIELE DEPENDABLE LOGISTCS	MIELE DEPENDABLE LOGISTCS	Miele international company	1	Miele international company

AngelosPachis · ‎01-13-2021

Hello @spencer046 ,

Do you mind giving the following workflow a try. If I understand the logic correctly, you want to see if the names under Revised Name 3 have any similarity to those of Revised Name 2

If there is no similarity, keep Revised Name 2 as an output, else maintain Revised Name 3.

Hope that helps, let me know if it worked for you.

Regards,

Angelos

spencer046 · ‎01-14-2021

@AngelosPachis well actually I do not want the Revised Name 3 column, that was there to show what I have done so far versus what I need to do now for my expected output (Desired Output). The similarity index that you used, can it be implemented in Revised Name 2 column to find the similarity within the column? Based on that I need to do the grouping in Desired Output column.

AngelosPachis · ‎01-14-2021

Oh ok, have you thought of using a Contains function @spencer046 ?

So if Revised Name 1 or 2(whatever is correct-they are the same) contains Miele, then return Miele international company

ELSEIF Revised name contains Los Angeles, then return LOS ANGELES SERVICE CENTER

ELSE Revised Name1

ENDIF

Alteryx Designer Desktop Discussions

Multi Row Name cleansing