Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Match List A ID to List D Descriptions where the Description contains one or many ID's

KyJoLe
Atom

Goal:

Match a Car (via Car ID field) in one data source to a Trip(s) (via free text Description field) in another data source. Essentially I want to do a left join where instead of a verbatim/exact match, I want to join on instances where the left join field is contained in the right join field.

 

Question:

Is there a way to get a list of ID's from one data source, and compare it to a list of Descriptions in another data source, to get a match/fuzzy match?

 

Supplemental Background:

Data Source 1: Contains a list of Cars (15,000 rows) and the Car's ID. This is an inventory of all our vehicles. Each row represents a single Car. The Car IDs are arbitrarily labeled and are inconsistent (i.e., one Car ID may be "15838293" and another might be "Kia_Car_29_2024")

Data Example 1:

Car ID
Kia-Forte_2018
Dougs Car
329874329874
Dougs Car Kia
AB1

 

Data Source 2: Contains a list of Trips (400,000 rows). This is an inventory of all the Trips that a Car took to say, deliver a package or pick up a client. Each row represents a single Trip. But, there is no "Car ID" field in Data Source 2 to directly link a Trip to a Car. However, a Description field does exist in which users will sometimes note the Car ID (e.g., "Personal. used car 15838293. 16 hours total." or "Delivery to warehouse. Kia_Car_29_2024. July 14th.").

Data Source Example 2: (bold being a match)

Trip IDDescription
1Trip to storage from warehouse - kia-forte_2018
2July 28 2019 - John - Warehouse to Service - AB112
3This was a trip from the data centers to the warehouse.
4Trip to Ohio. 329874329874
5James-1/2/2020-Warehouse
6Authorized vehicle driven to storage kia forte 2018
7Drove Dougs Car Kia to warehouse
8Vehicle to servicing
9N/A

 

Ideal Output:

Trip IDDescriptionMatched Car ID
1Trip to storage from warehouse - kia-forte_2018Kia-Forte_2018
2July 28 2019 - John - Warehouse to Service - AB112AB1
4Trip to Ohio. 329874329874329874329874
6Authorized vehicle driven to storage kia forte 2018Kia-Forte_2018
7Drove Dougs Car Kia to warehouseDougs Car
7Drove Dougs Car Kia to warehouseDougs Car Kia

 

Current Attempts:

Fuzzy Match: Tried this but it only allows you to bring in one data source

Find & Replace: Tried this and got results but I fear that if I had two Car ID's of "Dougs Car" and  "Dougs Car Kia" and a Trip Description of "Drove Dougs Car Kia to warehouse" it would match and therefore replace it with the incorrect "Dougs Car"

 

Any help would be appreciated, thank you!

8 ANTWORTEN 8
rzdodson
Quasar

@KyJoLe this should be what you are looking for. I used an Append/Formula combo to test whether the Car ID strings were inside the Trip Description. Hope that helps!

 

Edit: Added Generate Rows tool to the top of the workflow to account for Car IDs with special characters omitted (re: Car ID 6).

 

SPetrie
Quasar

Here is my take on a possible solution.

I use regex replace to replace special characters with optional single character placeholders and then use them as a pattern in a regex match inside a macro.

macro.PNGoutput.PNG

 

AGilbert
Bolide

@SPetrie do you have any documentation on the "question.controlparam" variable? I didn't know this technique existed.  

SPetrie
Quasar

@AGilbert ,I learned it from this blog entry.

I like to use it because I dont like lots of action tools cluttering up the workflow and you can set values in the workflow tab of the configuration window to make testing easier.

KyJoLe
Atom

Thank you rzdodson and SPetrie! I'm currently modifying/configuring and testing your solutions.

It's taking a little longer because the data sets I'm comparing are large (2 lists with ~150k+ rows = ~22 billion combinations if I understand the flow right).

I'll let you know the outcome once I finish.

shash270
Atom

Hi Guys,

 

I thought this would be helpful for my GL analysis for tax returns, I download GL data line by line description and then find keywords based on which the line items can be deductible and non-deductible. Let me know if this can be tweaked for that?

 

th

SPetrie
Quasar

@shash270 Im sure tweaks can be made, but I think it will be best for you to create your own post with example inputs and expected output rather than continue it here.

shash270
Atom

Thanks @Spetrie !

Beschriftungen