Alteryx Designer Desktop Discussions

mustufa2019 · ‎03-03-2020

I am trying to create a fuzzy match for a table of data with different conditions. When certain conditions are met, I would like to see a report showing the matched entries. See below for an example data input and see further for the 2 example conditions and output examples:

Input Data Example

Business Unit	Voucher ID	Invoice ID	Invoice Date	Supplier ID	Gross Amt
15000	00378736	010030660388	2/13/2018	8678	5,000
15000	00378737	010030660388C	2/13/2018	8678	3,000
15000	00378595	010030660347	5/20/2018	6756	2,500
15000	00378598	010030660347	5/20/2018	6756	250,055
15000	00378743	6-434-78195	11/2/2017	56442	4,000
15000	00378763	010030550347	2/2/2019	58763	2,000

Condition 1: Supplier, Invoice Date, Gross Amount are exactly same and Invoice ID numeric values are same but the alphabetic characters are different. See example of output below: here you can see all above conditions are met so I got a report showing this.

Business Unit

Voucher ID

Invoice ID

Invoice Date

Supplier ID

Gross Amt

15000	00378736	010030660388	2/13/2018	8678	5000
15000	00378737	010030660388C	2/13/2018	8678	5000

Condition 2: Supplier, Invoice, Invoice Date, are exactly same and gross amount first 4 digits match but the digits after that are different. See example of output below: here you can see all above conditions are met so I got a report showing this.

Business Unit

Voucher ID

Invoice ID

Invoice Date

Supplier ID

Gross Amt

15000	00378736	010030660347	2/13/2018	6756	2,500
15000	00378737	010030660347	2/13/2018	6756	250,055

Is there a way to build this in Alteryx?

seinchyiwoo · ‎03-03-2020

A quick workflow on identifying records to highlight the duplicates:

and the output:

You can also separate both workflows if you need to tag records that meet both conditions differently.

Workflows attached. Hope this helps.

mustufa2019 · ‎03-04-2020

@seinchyiwoo thank you for sharing this! However, i think this will not work since it will also show duplicate values that fully match as well. I only want to see matches that meet the below condition:

Condition 1: Supplier, Invoice Date, Gross Amount are exactly same and Invoice ID numeric values are same but the alphabetic characters are different. See example of output below: here you can see all above conditions are met so I got a report showing this.

Business Unit

Voucher ID

Invoice ID

Invoice Date

Supplier ID

Gross Amt

15000	00378736	010030660388	2/13/2018	8678	5000
15000	00378737	010030660388C	2/13/2018	8678	5000

By removing the characters using the below formula, i wont be able to tell whether the duplicates flagged really meet my above condition.

seinchyiwoo · ‎03-04-2020

What about this?

I'm using multi-row formula to check the original invoiceID if they were different.

Another quick idea off the top of my head is to use "summarize" to count distinct (1)InvoiceID and (2)InvoiceID_clean.

To identify duplicates, the condition being having (1)>1 and (2)=1.

-SC

mustufa2019 · ‎03-05-2020

@seinchyiwoo I think this helps a lot, thank you so much! But in order to build my other conditions, I am trying to figure out two expressions you created so I can utilize these for my other conditions. How did the below expression lead to the first output (screenshot 1)? How did it identify record 2 was a duplicate?

if [Invoice_ID_clean] = [Row-1:Invoice_ID_clean]
and
[Supplier ID] = [Row-1:Supplier ID]
and
[Invoice Date] = [Row-1:Invoice Date]
and
[Invoice ID] !=[Row-1:Invoice ID]
then 1
else 0
endif

Screenshot 1

Also, would the below expression capture duplicates if they are not in back to back rows? e.g. if there are duplicates in row 1 and row 4, would the below expression pick it up?

if [Row+1:is this row duplicate?]=1 then 1
else [is this row duplicate?] endif

Screenshot 2:

Alteryx Designer Desktop Discussions

Fuzzy Matching

Re: Alteryx Load Time

[Sharing] Custom Formula Function 'IsPrime' to Jud...

Re: Accounting for Zero Records on Email Output

Re: Extract text up to 3rd - from a string

Re: How to force Alteryx to input the column as te...