Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Fuzzy Match for Finding duplicate Invoice numbers

ShreyG
5 - Atom

I am looking to remove duplicates by matching two sets of invoice numbers and setting the match percentage between them. I have been trying to use the Merge option in Fuzzy match tool to perform the same but with no output in any match style.
New Invoice numbersNew Invoice numbersOld Invoice numbersOld Invoice numbersUnion of two databasesUnion of two databasestool configurationtool configuration 

For example, INV0001 and INV001 should show around 90% match which could then be used to remove duplicate invoices in new database.

Thanking in advance for any help!

5 REPLIES 5
randreag
11 - Bolide

Hello @ShreyG 

 

I have used fuzzy in a couple of opportunities and what I have seen is that if you use the same field to generate the Matchkey and the algorithm the fuzzy failed to do it. But if you use another field in this case I just use a constant 1 to generate the key and the match algrorithm i used with the field that matters it is better.

 

I attached a little example of what I just explained

 

i hope it helps 

echuong1
Alteryx Alumni (Retired)

Is the possible difference between the two invoices the leading zeroes after "INV"? If so, I recommend stripping the leading zeros in each invoice number and then doing the match off of that. I used a data cleansing tool to remove the "INV" portion, and then a formula tool to remove the leading zeroes and to put the "INV" part back.

 

I'd advise against using something like fuzzy match for an invoice like this because so many invoices could be similar to one another. For example, INV1000, INV0100, and INV0010 all contain the same digits and could show up as a match. They all have the same likelihood of being one or the other.

 

echuong1_0-1597105354209.png

 

ShreyG
5 - Atom

@aruiz-itp
That works just fine 🙂🙂
Thank you for the quick response!

Why do you think this issue occurs though? If a user wants to see the Matchkey for the selected field, how does one go about it?

ShreyG
5 - Atom

@echuong1

Thanks for the quick response!


While the leading zeroes after INV are the prominent reason for duplicates, other invoice numbers such as INV0010 might also be duplicates in the actual use-case. Which is why we wanted to see the match percentage (Match score) to identify them and work on them later in the workflow.

Appreciate your thought 🙂


randreag
11 - Bolide

Hi @ShreyG 

 

I think it happens because there are two processes in fuzzy match one to create the Matchkey and the other to apply the matching algorithm.

I experienced that when use the same field for the 2 (as you did) the key generated is too different and makes the match to fail.

 

I have had a use case with people names, so I put the whole name as a Matchkey and for last name and first name the algortihm. In that way the similar names are captured by the matching algorithm and the key is generated as similar even when is missing more than 3 letters.

 

I hope is a bit clear

 

 

Labels