A Find & Join combination - how to accomplish a complex join?
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi there community!
I have some financial data that I am trying to analyse, unfortunately I can't seem to find a way to combine the join and find replace tools in a way that will accomplish what I am trying to do. My current solution (see screenshot below) is very messy and manual whenever it needs to be updated.
My actual data I can't upload, but I have created a small sample dataset using different data to show what I am trying to achieve.
Data table;
City | Shop Name |
Berlin | KaufCentrum |
Berlin | Max Werk NER |
Berlin | EDEKA Markt |
Berlin | LIDL |
London | Billy Corner Store |
London | Borough Market |
London | Tescos Supermarket |
London | Leasure Centre |
Reference Table;
City | Text in Shop Name find | Shop Category |
Berlin | Kauf | General |
Berlin | ner | Hardware |
Berlin | Markt | Supermarket |
Berlin | Supermarket | Supermarket |
London | Supermarket | Supermarket |
London | Mark | Market |
London | Corner | Convinence |
Desired Results Table ;
City | Shop Name | Shop Category |
Berlin | KaufCentrum | General |
Berlin | Max Werk NER | Hardware |
Berlin | EDEKA Markt | Supermarket |
Berlin | LIDL | - |
London | Billy Corner Store | Convinence |
London | Borough Market | Market |
London | Tescos Supermarket | Supermarket |
London | Leasure Centre | - |
So what I want is to enrich the data table using my reference table, I want to use a join on the [City] to [City] and then simultaneously a findstring to match some amount of the [Shop Name] to the [Text in Shop Name find], returning the [Shop Category]. My actual data is ofcourse much much larger in scale and complexity than this so the solution needs to scale to ~100k Shop Names rows in the data table and ~1k [Text in Shop Name find] rows in the reference table.
The problems with my current approach is that I can either join on the City or do the findstring, but doing both at once I can't find a way, my current solution splits the data by city using 20+ filters so that I can then apply the findstring, but this is horrible to maintain.
There might be details I am missing out here, but I hope I have been clear in what I need to achieve, many thanks to anyone who helps 🙂
Solved! Go to Solution.
- Labels:
- Fuzzy Match
- Join
- Optimization
- Regex
- Workflow
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I only wanted to use this has last resort because it will create a lot of data. Check does the below workflow work for you.
Workflow:
Hope this helps : )
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@atcodedog05 I love it - just tested on my small sample data and got 200k rows back, only starting with 800 - will take me a little more to test the full dataset, it might be quite a few million rows but lets see if it works.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Full data gives me 3mil just after the append with 1.3GB of data but my machine can just about handle it! Will have proper interrigation of the data tomorrow as it will take me a while to check it all works.
Thanks so much for your help @atcodedog05
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Happy to help : ) @Glycogenolysis
Cheers and have a nice day!
- « Previous
-
- 1
- 2
- Next »