Hi
I have a list of companies mapped by industry subsector. Then I have another reference table mapping subsectors to sectors.
I want to append the sectors as a column to my main main company list table. As the subsector names in both tables vary slightly (space before / etc.) I am using fuzzy match.
The matching process delivers expected results but I am having trouble appending the sector column to my main table as. Currently I use two join tools plus a unique tool as the two joins create a variety of duplicate rows. I have looked at other forum solutions but couldn't make it work. I feel there must be a better/more accurate/elegant way of doing this.
Appreciate any help, thanks!
Solved! Go to Solution.
hi @Simon2902,
Attached you can find my proposal to join both data sets and have the sector as a new column
What I noticed is that you needed to clean the keys. I applied the data cleansing tool to do that, as well as the formula tool to unify things like
Petrochemicals- Base
Petrochemicals -Base
Petrochemicals-Base
To be: Petrochemicals - Base
The same happened with the "/"
It is still not perfect, you need to clean 13 keys, but these ones have "different" names
For example:
Building Suppliers (Trade) vs Building Suppliers (Trade)/Diy
or
Security & Safety Systems vs Security & Safety
These differences can be cached with the fuzzy match tool
I hope this helps
Let me know if you have any questions.
Best,
Vianney
Thanks @VianneyM
that solution helps to remove duplicates. I added in a fuzzy matching step similar to my original workflow which, together, works out well. I will accept your solution, workflow attached.