Hello dear people,
I have the below dataset of different versions of company names for a particular location key group (Los Angeles_CA_90023). Currently I have "Revised Name 3" that replaces all the names in "Revised Name 2" column with the one with highest count. Now, while there are different company name versions of the "Miele" company, some of the company names are entirely different, and I want to replace the names based on either the highest count of their own name similarities (maybe based on the first word or two of the entire name), or leave it as it is (in case there is no potential similarity). My Desired outcome is shown on the right most column. Need help badly!
Location Key | Revised Name 1 | Revised Name 2 | Revised Name 3 | Count | Desired outcome |
Los Angeles_CA_90023 | Miele international company | Miele international company | Miele international company | 6466 | Miele international company |
Los Angeles_CA_90023 | MIELE DEPENDABLE CENTER | MIELE DEPENDABLE CENTER | Miele international company | 18 | Miele international company |
Los Angeles_CA_90023 | MIELE DEPENDABLE DISTRI | MIELE DEPENDABLE DISTRI | Miele international company | 16 | Miele international company |
Los Angeles_CA_90023 | LOS ANGELES SERVICE CENTER | LOS ANGELES SERVICE CENTER | Miele international company | 11 | LOS ANGELES SERVICE CENTER |
Los Angeles_CA_90023 | LOS ANGELES SERVICE | LOS ANGELES SERVICE | Miele international company | 9 | LOS ANGELES SERVICE CENTER |
Los Angeles_CA_90023 | DDC CALIFORNIA | DDC CALIFORNIA | Miele international company | 4 | DDC CALIFORNIA |
Los Angeles_CA_90023 | Dependable Distribution Center | Dependable Distribution Center | Miele international company | 3 | Dependable Distribution Center |
Los Angeles_CA_90023 | BAY CITIES C O | BAY CITIES C O | Miele international company | 2 | BAY CITIES C O |
Los Angeles_CA_90023 | BAY CITIES C O BARBARA | BAY CITIES C O BARBARA | Miele international company | 1 | BAY CITIES C O |
Los Angeles_CA_90023 | MIELE DEPENDABLE | MIELE DEPENDABLE | Miele international company | 1 | Miele international company |
Los Angeles_CA_90023 | MIELE DEPENDABLE LOGISTCS | MIELE DEPENDABLE LOGISTCS | Miele international company | 1 | Miele international company |
Hello @spencer046 ,
Do you mind giving the following workflow a try. If I understand the logic correctly, you want to see if the names under Revised Name 3 have any similarity to those of Revised Name 2
If there is no similarity, keep Revised Name 2 as an output, else maintain Revised Name 3.
Hope that helps, let me know if it worked for you.
Regards,
Angelos
@AngelosPachis well actually I do not want the Revised Name 3 column, that was there to show what I have done so far versus what I need to do now for my expected output (Desired Output). The similarity index that you used, can it be implemented in Revised Name 2 column to find the similarity within the column? Based on that I need to do the grouping in Desired Output column.
Oh ok, have you thought of using a Contains function @spencer046 ?
So if Revised Name 1 or 2(whatever is correct-they are the same) contains Miele, then return Miele international company
ELSEIF Revised name contains Los Angeles, then return LOS ANGELES SERVICE CENTER
ELSE Revised Name1
ENDIF