Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Match List A ID to List D Descriptions where the Description contains one or many ID's

KyJoLe
5 - Atom

Goal:

Match a Car (via Car ID field) in one data source to a Trip(s) (via free text Description field) in another data source. Essentially I want to do a left join where instead of a verbatim/exact match, I want to join on instances where the left join field is contained in the right join field.

 

Question:

Is there a way to get a list of ID's from one data source, and compare it to a list of Descriptions in another data source, to get a match/fuzzy match?

 

Supplemental Background:

Data Source 1: Contains a list of Cars (15,000 rows) and the Car's ID. This is an inventory of all our vehicles. Each row represents a single Car. The Car IDs are arbitrarily labeled and are inconsistent (i.e., one Car ID may be "15838293" and another might be "Kia_Car_29_2024")

Data Example 1:

Car ID
Kia-Forte_2018
Dougs Car
329874329874
Dougs Car Kia
AB1

 

Data Source 2: Contains a list of Trips (400,000 rows). This is an inventory of all the Trips that a Car took to say, deliver a package or pick up a client. Each row represents a single Trip. But, there is no "Car ID" field in Data Source 2 to directly link a Trip to a Car. However, a Description field does exist in which users will sometimes note the Car ID (e.g., "Personal. used car 15838293. 16 hours total." or "Delivery to warehouse. Kia_Car_29_2024. July 14th.").

Data Source Example 2: (bold being a match)

Trip IDDescription
1Trip to storage from warehouse - kia-forte_2018
2July 28 2019 - John - Warehouse to Service - AB112
3This was a trip from the data centers to the warehouse.
4Trip to Ohio. 329874329874
5James-1/2/2020-Warehouse
6Authorized vehicle driven to storage kia forte 2018
7Drove Dougs Car Kia to warehouse
8Vehicle to servicing
9N/A

 

Ideal Output:

Trip IDDescriptionMatched Car ID
1Trip to storage from warehouse - kia-forte_2018Kia-Forte_2018
2July 28 2019 - John - Warehouse to Service - AB112AB1
4Trip to Ohio. 329874329874329874329874
6Authorized vehicle driven to storage kia forte 2018Kia-Forte_2018
7Drove Dougs Car Kia to warehouseDougs Car
7Drove Dougs Car Kia to warehouseDougs Car Kia

 

Current Attempts:

Fuzzy Match: Tried this but it only allows you to bring in one data source

Find & Replace: Tried this and got results but I fear that if I had two Car ID's of "Dougs Car" and  "Dougs Car Kia" and a Trip Description of "Drove Dougs Car Kia to warehouse" it would match and therefore replace it with the incorrect "Dougs Car"

 

Any help would be appreciated, thank you!

7 REPLIES 7
rzdodson
12 - Quasar

@KyJoLe this should be what you are looking for. I used an Append/Formula combo to test whether the Car ID strings were inside the Trip Description. Hope that helps!

 

Edit: Added Generate Rows tool to the top of the workflow to account for Car IDs with special characters omitted (re: Car ID 6).

 

SPetrie
12 - Quasar

Here is my take on a possible solution.

I use regex replace to replace special characters with optional single character placeholders and then use them as a pattern in a regex match inside a macro.

macro.PNGoutput.PNG

 

AGilbert
10 - Fireball

@SPetrie do you have any documentation on the "question.controlparam" variable? I didn't know this technique existed.  

SPetrie
12 - Quasar

@AGilbert ,I learned it from this blog entry.

I like to use it because I dont like lots of action tools cluttering up the workflow and you can set values in the workflow tab of the configuration window to make testing easier.

KyJoLe
5 - Atom

Thank you rzdodson and SPetrie! I'm currently modifying/configuring and testing your solutions.

It's taking a little longer because the data sets I'm comparing are large (2 lists with ~150k+ rows = ~22 billion combinations if I understand the flow right).

I'll let you know the outcome once I finish.

shash270
5 - Atom

Hi Guys,

 

I thought this would be helpful for my GL analysis for tax returns, I download GL data line by line description and then find keywords based on which the line items can be deductible and non-deductible. Let me know if this can be tweaked for that?

 

th

SPetrie
12 - Quasar

@shash270 Im sure tweaks can be made, but I think it will be best for you to create your own post with example inputs and expected output rather than continue it here.

Labels