Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Fuzzy Matching

mustufa2019
8 - Asteroid

I am trying to create a fuzzy match for a table of data with different conditions. When certain conditions are met, I would like to see a report showing the matched entries. See below for an example data input and see further for the 2 example conditions and output examples:  

 

Input Data Example

Business UnitVoucher IDInvoice IDInvoice DateSupplier IDGross Amt
15000003787360100306603882/13/201886785,000
1500000378737010030660388C2/13/201886783,000
15000003785950100306603475/20/201867562,500
15000003785980100306603475/20/20186756250,055
15000003787436-434-7819511/2/2017564424,000
15000003787630100305503472/2/2019587632,000

 

Condition 1: Supplier, Invoice DateGross Amount are exactly same and Invoice ID numeric values are same but the alphabetic characters are different. See example of output below: here you can see all above conditions are met so I got a report showing this.

Business UnitVoucher IDInvoice IDInvoice DateSupplier IDGross Amt
15000003787360100306603882/13/201886785000
1500000378737010030660388C2/13/201886785000

 

Condition 2: Supplier, InvoiceInvoice Date, are exactly same and gross amount first 4 digits match but the digits after that are different. See example of output below: here you can see all above conditions are met so I got a report showing this.

 

Business UnitVoucher IDInvoice IDInvoice DateSupplier IDGross Amt
15000003787360100306603472/13/201867562,500
15000003787370100306603472/13/20186756250,055

 

Is there a way to build this in Alteryx? 



4 REPLIES 4
seinchyiwoo
Alteryx Alumni (Retired)

A quick workflow on identifying records to highlight the duplicates:

seinchyiwoo_0-1583285954390.png

and the output:

seinchyiwoo_1-1583286056501.png

 

You can also separate both workflows if you need to tag records that meet both conditions differently.

Workflows attached. Hope this helps.

mustufa2019
8 - Asteroid

@seinchyiwoo thank you for sharing this! However, i think this will not work since it will also show duplicate values that fully match as well. I only want to see matches that meet the below condition:

 

Condition 1: Supplier, Invoice DateGross Amount are exactly same and Invoice ID numeric values are same but the alphabetic characters are different. See example of output below: here you can see all above conditions are met so I got a report showing this.

Business UnitVoucher IDInvoice IDInvoice DateSupplier IDGross Amt
15000003787360100306603882/13/201886785000
1500000378737010030660388C2/13/201886785000

    

 

By removing the characters using the below formula, i wont be able to tell whether the duplicates flagged really meet my above condition. 

Screenprint1.PNG

 

seinchyiwoo
Alteryx Alumni (Retired)

What about this?

seinchyiwoo_0-1583373711805.png

 

I'm using multi-row formula to check the original invoiceID if they were different.

Another quick idea off the top of my head is to use "summarize" to count distinct (1)InvoiceID and (2)InvoiceID_clean.

To identify duplicates, the condition being having (1)>1 and (2)=1.

 

-SC

mustufa2019
8 - Asteroid

@seinchyiwoo I think this helps a lot, thank you so much! But in order to build my other conditions, I am trying to figure out two expressions you created so I can utilize these for my other conditions. How did the below expression lead to the first output (screenshot 1)? How did it identify record 2 was a duplicate?

 

if [Invoice_ID_clean] = [Row-1:Invoice_ID_clean]
and
[Supplier ID] = [Row-1:Supplier ID]
and
[Invoice Date] = [Row-1:Invoice Date]
and
[Invoice ID] !=[Row-1:Invoice ID]
then 1
else 0
endif

 

Screenshot 1

Print Screen 1.PNG

 

Also, would the below expression capture duplicates if they are not in back to back rows? e.g. if there are duplicates in row 1 and row 4, would the below expression pick it up? 

 

if [Row+1:is this row duplicate?]=1 then 1
else [is this row duplicate?] endif

 

Screenshot 2:

screenprint2.PNG

 

Labels