Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
Register for the upcoming Live Community Q&A Session - and don't forget to submit your questions for @DeanS regarding the future role of analytics here.
SOLVED

Fuzzy Matching

Highlighted
8 - Asteroid

I am trying to create a fuzzy match for a table of data with different conditions. When certain conditions are met, I would like to see a report showing the matched entries. See below for an example data input and see further for the 2 example conditions and output examples:  

 

Input Data Example

Business UnitVoucher IDInvoice IDInvoice DateSupplier IDGross Amt
15000003787360100306603882/13/201886785,000
1500000378737010030660388C2/13/201886783,000
15000003785950100306603475/20/201867562,500
15000003785980100306603475/20/20186756250,055
15000003787436-434-7819511/2/2017564424,000
15000003787630100305503472/2/2019587632,000

 

Condition 1: Supplier, Invoice DateGross Amount are exactly same and Invoice ID numeric values are same but the alphabetic characters are different. See example of output below: here you can see all above conditions are met so I got a report showing this.

Business UnitVoucher IDInvoice IDInvoice DateSupplier IDGross Amt
15000003787360100306603882/13/201886785000
1500000378737010030660388C2/13/201886785000

 

Condition 2: Supplier, InvoiceInvoice Date, are exactly same and gross amount first 4 digits match but the digits after that are different. See example of output below: here you can see all above conditions are met so I got a report showing this.

 

Business UnitVoucher IDInvoice IDInvoice DateSupplier IDGross Amt
15000003787360100306603472/13/201867562,500
15000003787370100306603472/13/20186756250,055

 

Is there a way to build this in Alteryx? 



Highlighted
Alteryx
Alteryx

A quick workflow on identifying records to highlight the duplicates:

seinchyiwoo_0-1583285954390.png

and the output:

seinchyiwoo_1-1583286056501.png

 

You can also separate both workflows if you need to tag records that meet both conditions differently.

Workflows attached. Hope this helps.

Highlighted
8 - Asteroid

@seinchyiwoo thank you for sharing this! However, i think this will not work since it will also show duplicate values that fully match as well. I only want to see matches that meet the below condition:

 

Condition 1: Supplier, Invoice DateGross Amount are exactly same and Invoice ID numeric values are same but the alphabetic characters are different. See example of output below: here you can see all above conditions are met so I got a report showing this.

Business UnitVoucher IDInvoice IDInvoice DateSupplier IDGross Amt
15000003787360100306603882/13/201886785000
1500000378737010030660388C2/13/201886785000

    

 

By removing the characters using the below formula, i wont be able to tell whether the duplicates flagged really meet my above condition. 

Screenprint1.PNG

 

Highlighted
Alteryx
Alteryx

What about this?

seinchyiwoo_0-1583373711805.png

 

I'm using multi-row formula to check the original invoiceID if they were different.

Another quick idea off the top of my head is to use "summarize" to count distinct (1)InvoiceID and (2)InvoiceID_clean.

To identify duplicates, the condition being having (1)>1 and (2)=1.

 

-SC

Highlighted
8 - Asteroid

@seinchyiwoo I think this helps a lot, thank you so much! But in order to build my other conditions, I am trying to figure out two expressions you created so I can utilize these for my other conditions. How did the below expression lead to the first output (screenshot 1)? How did it identify record 2 was a duplicate?

 

if [Invoice_ID_clean] = [Row-1:Invoice_ID_clean]
and
[Supplier ID] = [Row-1:Supplier ID]
and
[Invoice Date] = [Row-1:Invoice Date]
and
[Invoice ID] !=[Row-1:Invoice ID]
then 1
else 0
endif

 

Screenshot 1

Print Screen 1.PNG

 

Also, would the below expression capture duplicates if they are not in back to back rows? e.g. if there are duplicates in row 1 and row 4, would the below expression pick it up? 

 

if [Row+1:is this row duplicate?]=1 then 1
else [is this row duplicate?] endif

 

Screenshot 2:

screenprint2.PNG

 

Labels