Hi there community!
I have some financial data that I am trying to analyse, unfortunately I can't seem to find a way to combine the join and find replace tools in a way that will accomplish what I am trying to do. My current solution (see screenshot below) is very messy and manual whenever it needs to be updated.
My actual data I can't upload, but I have created a small sample dataset using different data to show what I am trying to achieve.
Data table;
City | Shop Name |
Berlin | KaufCentrum |
Berlin | Max Werk NER |
Berlin | EDEKA Markt |
Berlin | LIDL |
London | Billy Corner Store |
London | Borough Market |
London | Tescos Supermarket |
London | Leasure Centre |
Reference Table;
City | Text in Shop Name find | Shop Category |
Berlin | Kauf | General |
Berlin | ner | Hardware |
Berlin | Markt | Supermarket |
Berlin | Supermarket | Supermarket |
London | Supermarket | Supermarket |
London | Mark | Market |
London | Corner | Convinence |
Desired Results Table ;
City | Shop Name | Shop Category |
Berlin | KaufCentrum | General |
Berlin | Max Werk NER | Hardware |
Berlin | EDEKA Markt | Supermarket |
Berlin | LIDL | - |
London | Billy Corner Store | Convinence |
London | Borough Market | Market |
London | Tescos Supermarket | Supermarket |
London | Leasure Centre | - |
So what I want is to enrich the data table using my reference table, I want to use a join on the [City] to [City] and then simultaneously a findstring to match some amount of the [Shop Name] to the [Text in Shop Name find], returning the [Shop Category]. My actual data is ofcourse much much larger in scale and complexity than this so the solution needs to scale to ~100k Shop Names rows in the data table and ~1k [Text in Shop Name find] rows in the reference table.
The problems with my current approach is that I can either join on the City or do the findstring, but doing both at once I can't find a way, my current solution splits the data by city using 20+ filters so that I can then apply the findstring, but this is horrible to maintain.
There might be details I am missing out here, but I hope I have been clear in what I need to achieve, many thanks to anyone who helps 🙂
Solved! Go to Solution.
Here is how you can do it.
Workflow:
1. Using find and replace to get required columns.
2. Using filter to match cities.
Hope this helps : )
Thanks that is so simple I am annoyed I didnt think of it - I'll give it a go and report back if it works for my full usecase, but I expect it to. Many thanks
I added some new data. I have updated my workflow to improve the city join case and keep all rows. Please take a look into it.
Workflow:
Hope this helps : )
Hi @atcodedog05
Thanks a lot for your help, but even with that ammendment I still have some issues due to substrings not matching when a full string matches instead - I've slightly adjusted my input data to match some of my financial data where I am seeing large holes of missing matches. In the below "Tescos Supermarket" is matching to "Supermarket" in Berlin instead of "Super" in London, hense I miss this match after the formula result and the Find Replace completely misses this field - I need a more matrix way of matching these so that I can get multiple matches.... and im back to being a bit puzzled.
Data Table;
City | Shop Name |
Berlin | KaufCentrum |
Berlin | Max Werk NER |
Berlin | EDEKA Markt |
Berlin | LIDL |
London | Billy Corner Store |
London | Borough Market |
London | Tescos Supermarket |
London | Leasure Centre |
Ref Table;
City | Text in Shop Name find | Shop Category |
Berlin | Kauf | General |
Berlin | ner | Hardware |
Berlin | Markt | Supermarket |
Berlin | Supermarket | Supermarket |
London | Super | Supermarket |
London | Mark | Market |
London | Corner | Convinence |
Output Data;
City | Shop Name | Shop Category | Cities |
Berlin | KaufCentrum | General | Berlin |
Berlin | Max Werk NER | Hardware | Berlin |
Berlin | EDEKA Markt | Supermarket | Berlin |
Berlin | LIDL | - | |
London | Billy Corner Store | Convinence | London |
London | Borough Market | Market | London |
London | Tescos Supermarket | - | Berlin |
London | Leasure Centre | - |
I have posted an updated workflow above. It works for this scenario.
Workflow:
Hope this helps : )
@atcodedog05 The Output data has >[London] [Tescos Supermarket] as "-" for the [Shop Category]
Yea out ouputs are matching. The issue is that I have "Tescos Supermarket" which matches to "Supermarket" in Berlin and does not match to "Super" in London, the City of the "Tescos Supermarket" is London, so I only get the 1 match back from Berlin leading to a "-". For my full workflow I need to get that match on the "Super" and then I can also see the London results hense this "Tescos Supermarket" should get a category, but in the current process it does not due to matching to Berlin instead of the partial string in London.
Does that make sense? I have tried my best to explain the issue - in this example this looks like a relatively trivial problem, but in my data this is thousands and thousands of rows 😞