I am trying to create a fuzzy match for a table of data with different conditions. When certain conditions are met, I would like to see a report showing the matched entries. See below for an example data input and see further for the 2 example conditions and output examples:
Input Data Example
Business Unit | Voucher ID | Invoice ID | Invoice Date | Supplier ID | Gross Amt |
15000 | 00378736 | 010030660388 | 2/13/2018 | 8678 | 5,000 |
15000 | 00378737 | 010030660388C | 2/13/2018 | 8678 | 3,000 |
15000 | 00378595 | 010030660347 | 5/20/2018 | 6756 | 2,500 |
15000 | 00378598 | 010030660347 | 5/20/2018 | 6756 | 250,055 |
15000 | 00378743 | 6-434-78195 | 11/2/2017 | 56442 | 4,000 |
15000 | 00378763 | 010030550347 | 2/2/2019 | 58763 | 2,000 |
Condition 1: Supplier, Invoice Date, Gross Amount are exactly same and Invoice ID numeric values are same but the alphabetic characters are different. See example of output below: here you can see all above conditions are met so I got a report showing this.
Business Unit | Voucher ID | Invoice ID | Invoice Date | Supplier ID | Gross Amt |
15000 | 00378736 | 010030660388 | 2/13/2018 | 8678 | 5000 |
15000 | 00378737 | 010030660388C | 2/13/2018 | 8678 | 5000 |
Condition 2: Supplier, Invoice, Invoice Date, are exactly same and gross amount first 4 digits match but the digits after that are different. See example of output below: here you can see all above conditions are met so I got a report showing this.
Business Unit | Voucher ID | Invoice ID | Invoice Date | Supplier ID | Gross Amt |
15000 | 00378736 | 010030660347 | 2/13/2018 | 6756 | 2,500 |
15000 | 00378737 | 010030660347 | 2/13/2018 | 6756 | 250,055 |
Is there a way to build this in Alteryx?
Solved! Go to Solution.
@seinchyiwoo thank you for sharing this! However, i think this will not work since it will also show duplicate values that fully match as well. I only want to see matches that meet the below condition:
Condition 1: Supplier, Invoice Date, Gross Amount are exactly same and Invoice ID numeric values are same but the alphabetic characters are different. See example of output below: here you can see all above conditions are met so I got a report showing this.
Business Unit | Voucher ID | Invoice ID | Invoice Date | Supplier ID | Gross Amt |
15000 | 00378736 | 010030660388 | 2/13/2018 | 8678 | 5000 |
15000 | 00378737 | 010030660388C | 2/13/2018 | 8678 | 5000 |
By removing the characters using the below formula, i wont be able to tell whether the duplicates flagged really meet my above condition.
What about this?
I'm using multi-row formula to check the original invoiceID if they were different.
Another quick idea off the top of my head is to use "summarize" to count distinct (1)InvoiceID and (2)InvoiceID_clean.
To identify duplicates, the condition being having (1)>1 and (2)=1.
-SC
@seinchyiwoo I think this helps a lot, thank you so much! But in order to build my other conditions, I am trying to figure out two expressions you created so I can utilize these for my other conditions. How did the below expression lead to the first output (screenshot 1)? How did it identify record 2 was a duplicate?
if [Invoice_ID_clean] = [Row-1:Invoice_ID_clean]
and
[Supplier ID] = [Row-1:Supplier ID]
and
[Invoice Date] = [Row-1:Invoice Date]
and
[Invoice ID] !=[Row-1:Invoice ID]
then 1
else 0
endif
Screenshot 1
Also, would the below expression capture duplicates if they are not in back to back rows? e.g. if there are duplicates in row 1 and row 4, would the below expression pick it up?
if [Row+1:is this row duplicate?]=1 then 1
else [is this row duplicate?] endif
Screenshot 2: