Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Matching records from different sources

Blondek
7 - Meteor

Hi all, 

 

This is a familiar subject, yet none of the previous threads can answer my question! 

 

I need to do a data reconciliation between two sources - based on the name of the TV episode broadcast. In one source, the title of the show will be 'Show Title' and the episode '01 - Episode 1'. In the other data source, this will be 'SHOW TITLE_' and 'EPISODE 1'. 

 

I'd like to use the list of shows in the clean data as a reference for the dirty data - i.e. if the title of the show is included in this list, show the other columns. But of course, I need some approximate match. 

 

Thanks, 

 

Matt

7 REPLIES 7
ChrisTX
15 - Aurora

Can you post some sample input and expected output?

 

Will the two sources ALWAYS follow the same pattern?

source 1.....title of the show will be 'Show Title' and the episode '01 - Episode 1'

source 2....will be 'SHOW TITLE_' and 'EPISODE 1'. 

 

If yes, you can use a formula or regex tool to create a field to be used in a Join.

 

If the matching is sometimes more complicated, try the regular Join first, then follow with a Fuzzy Match.  This page has a few videos to help with Fuzzy Match: https://community.alteryx.com/t5/Videos/Video-Training-Index/td-p/45161

 

Chris

Blondek
7 - Meteor

Ah, thanks Chris. 

 

Here's a sample of what I'm trying to do. I'm trying to join them together and investigate any differences in the $ amount in Field 3. 

 

Thanks

ChrisTX
15 - Aurora

See attached workflow.  I used the Text to Columns tool to split the Episode Name from source 1.

 

Annotation 2020-04-01 064710.png

 

Chris

Blondek
7 - Meteor

Thanks Chris, 

 

Though in this incredibly large dataset, it's not always '01 - Episode 1'; what I was trying to say in my poor explanation is that they're almost always somewhat different, so I'd need some kind of approximate match! 

 

Thanks, 

 

Matt

ChrisTX
15 - Aurora

If you need an approximate match, trying using a Formula where possible to crate an exact match field to use in a Join.  The follow with a Fuzzy Match tool non-exact matches.  See the Fuzzy Match training videos mentioned above.

 

Blondek
7 - Meteor

Thanks Chris, 

 

I've now done the Fuzzy Match and I have two columns now - Clean and Dirty. I'd like to take that dataset and match it back to the dirty set, replacing the dirty names with the clean names. 

 

Before I do a match tool, I'm on 370,000 fields. After that, I'm on 5.6m. Do you know why? Am I doing it wrong? 

 

Cheers

ChrisTX
15 - Aurora

If you are generating a key for each word, you'll have more records output from the Fuzzy Match.

I'm guessing you're asking about the number of records, even though you mentioned "370,000 fields".

 

There are two Fuzzy Matching beginner videos.  This one https://community.alteryx.com/t5/Videos/Introduction-to-Fuzzy-Matching/td-p/43884 mentions that you'll always want to include a Unique tool after the Fuzzy Match tool.

Labels