community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
Upgrade Alteryx Designer in 10 Steps

Debating whether or not to upgrade to the latest version of Alteryx Designer?

LEARN MORE
SOLVED

Data match company name

Meteor

Dear all,

 

I have a question related to the tool I should use in Alteryx when it comes to match big data. I have a source file of 4M+ records and a target one of 2K. I need to get information from the source data to update the target one. The only way to match the 2 is by Company name. As you may imagine the company names can have differences related to punctuation ...

 

When I use the Join tool I have something like 10% target data updated.

 

When I use fuzzy match it never ends.

 

Your input is much appreciated.

 

Thanks,

Andy

Highlighted
Alteryx Partner

I would recommend you to use the find replace tool with the append fields to record set up.

 

This should help to get more.

 

Also I would set a cleaning tool to remove al punctuation and a previous find replace to switch all misspellings and abbreviations.

 

Cheers

ACE Emeritus
ACE Emeritus

Fuzzy match is very likely to never end unless you use a "waterfall" method....

 

Set your match criteria (either high or low thresholds, depending on your methodology), do the fuzzy match and set aside records that have a match.

 

Then take the unmatched records, change the match thresholds, and run the fuzzy match again.

 

Keep incrementally changing the thresholds. Once you've got a satisfactory match percentage, you can union all the outputs from prior fuzzy matches.

 

I hope that makes sense. I got the waterfall technique from this training video: https://community.alteryx.com/t5/Live-Training/Live-Training-Fuzzy-Matching-Intermediate-Users/td-p/...

 

The suggestion to start with a low threshold came from another solutions engineer who recommended it so that you're not going through successive iterations only to find that you made the cutoff at 65% but 64% is really the magic number.

@thizviz aka cbridges, Bolide
http://community.alteryx.com/t5/user/viewprofilepage/user-id/2328
Labels