Goal:
Match a Car (via Car ID field) in one data source to a Trip(s) (via free text Description field) in another data source. Essentially I want to do a left join where instead of a verbatim/exact match, I want to join on instances where the left join field is contained in the right join field.
Question:
Is there a way to get a list of ID's from one data source, and compare it to a list of Descriptions in another data source, to get a match/fuzzy match?
Supplemental Background:
Data Source 1: Contains a list of Cars (15,000 rows) and the Car's ID. This is an inventory of all our vehicles. Each row represents a single Car. The Car IDs are arbitrarily labeled and are inconsistent (i.e., one Car ID may be "15838293" and another might be "Kia_Car_29_2024")
Data Example 1:
Car ID |
Kia-Forte_2018 |
Dougs Car |
329874329874 |
Dougs Car Kia |
AB1 |
Data Source 2: Contains a list of Trips (400,000 rows). This is an inventory of all the Trips that a Car took to say, deliver a package or pick up a client. Each row represents a single Trip. But, there is no "Car ID" field in Data Source 2 to directly link a Trip to a Car. However, a Description field does exist in which users will sometimes note the Car ID (e.g., "Personal. used car 15838293. 16 hours total." or "Delivery to warehouse. Kia_Car_29_2024. July 14th.").
Data Source Example 2: (bold being a match)
Trip ID | Description |
1 | Trip to storage from warehouse - kia-forte_2018 |
2 | July 28 2019 - John - Warehouse to Service - AB112 |
3 | This was a trip from the data centers to the warehouse. |
4 | Trip to Ohio. 329874329874 |
5 | James-1/2/2020-Warehouse |
6 | Authorized vehicle driven to storage kia forte 2018 |
7 | Drove Dougs Car Kia to warehouse |
8 | Vehicle to servicing |
9 | N/A |
Ideal Output:
Trip ID | Description | Matched Car ID |
1 | Trip to storage from warehouse - kia-forte_2018 | Kia-Forte_2018 |
2 | July 28 2019 - John - Warehouse to Service - AB112 | AB1 |
4 | Trip to Ohio. 329874329874 | 329874329874 |
6 | Authorized vehicle driven to storage kia forte 2018 | Kia-Forte_2018 |
7 | Drove Dougs Car Kia to warehouse | Dougs Car |
7 | Drove Dougs Car Kia to warehouse | Dougs Car Kia |
Current Attempts:
Fuzzy Match: Tried this but it only allows you to bring in one data source
Find & Replace: Tried this and got results but I fear that if I had two Car ID's of "Dougs Car" and "Dougs Car Kia" and a Trip Description of "Drove Dougs Car Kia to warehouse" it would match and therefore replace it with the incorrect "Dougs Car"
Any help would be appreciated, thank you!
@KyJoLe this should be what you are looking for. I used an Append/Formula combo to test whether the Car ID strings were inside the Trip Description. Hope that helps!
Edit: Added Generate Rows tool to the top of the workflow to account for Car IDs with special characters omitted (re: Car ID 6).
@SPetrie do you have any documentation on the "question.controlparam" variable? I didn't know this technique existed.
Thank you rzdodson and SPetrie! I'm currently modifying/configuring and testing your solutions.
It's taking a little longer because the data sets I'm comparing are large (2 lists with ~150k+ rows = ~22 billion combinations if I understand the flow right).
I'll let you know the outcome once I finish.
Hi Guys,
I thought this would be helpful for my GL analysis for tax returns, I download GL data line by line description and then find keywords based on which the line items can be deductible and non-deductible. Let me know if this can be tweaked for that?
th
@shash270 Im sure tweaks can be made, but I think it will be best for you to create your own post with example inputs and expected output rather than continue it here.
Thanks @Spetrie !