Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Data match company name

emil
7 - Meteor

Dear all,

 

I have a question related to the tool I should use in Alteryx when it comes to match big data. I have a source file of 4M+ records and a target one of 2K. I need to get information from the source data to update the target one. The only way to match the 2 is by Company name. As you may imagine the company names can have differences related to punctuation ...

 

When I use the Join tool I have something like 10% target data updated.

 

When I use fuzzy match it never ends.

 

Your input is much appreciated.

 

Thanks,

Andy

2 REPLIES 2
afv2688
16 - Nebula
16 - Nebula

I would recommend you to use the find replace tool with the append fields to record set up.

 

This should help to get more.

 

Also I would set a cleaning tool to remove al punctuation and a previous find replace to switch all misspellings and abbreviations.

 

Cheers

ThizViz
11 - Bolide

Fuzzy match is very likely to never end unless you use a "waterfall" method....

 

Set your match criteria (either high or low thresholds, depending on your methodology), do the fuzzy match and set aside records that have a match.

 

Then take the unmatched records, change the match thresholds, and run the fuzzy match again.

 

Keep incrementally changing the thresholds. Once you've got a satisfactory match percentage, you can union all the outputs from prior fuzzy matches.

 

I hope that makes sense. I got the waterfall technique from this training video: https://community.alteryx.com/t5/Live-Training/Live-Training-Fuzzy-Matching-Intermediate-Users/td-p/...

 

The suggestion to start with a low threshold came from another solutions engineer who recommended it so that you're not going through successive iterations only to find that you made the cutoff at 65% but 64% is really the magic number.

@thizviz aka cbridges, Bolide
http://community.alteryx.com/t5/user/viewprofilepage/user-id/2328
Labels