Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Identifying sales for a specific person using Fuzzy Match (or alternatives)?

NaiLo
8 - Asteroid

Hi folks,

 

I have a situation I'm trying to resolve using the Fuzzy Match tool, but I just can't get it to work. I'm hoping someone can help me with my Fuzzy Match configuration, or maybe an alternative to using the tool.

 

I have two datasets, one has the sales for all of the sales people in my company. The other has the sales for a specific person in my company, let's call them Sam.

Goal: I want to run some summary stats on the sales data for everyone in the company, but I want to remove the records pertaining to Sam using the data set I have for their sales.

Fields Available:

In both data sets, I have the following fields:

* Sales Date/Time - The date and time when a sale occurred

* Item Category - the general category that an item sold belongs to. I have two possibilities, A and B

* SKU - the unique ID for items that were sold on a given day. More than one item could have been sold on a given day by a given salesperson, so this is NOT an Order ID.

* Price - The price that an individual item was sold for

* Quantity - the amount of an individual item that I sold

 

Notes:

* In my case, the time matters. The time of a sale should be the exact same in both data sets, but it could be within a minute or two.

* More than one item could have been sold on a given day by a salesperson, but I do NOT have an Order ID

* I also do not have a trader ID, so I am relying on the combination of date/time, SKU, Department, Price, and Quantity to find the records that most likely belong to Sam

* SKU ID needs to be exact, and Item Category needs to be exact. The fuzzy logic comes into play for the Date/Time, price, and probably quantity. Price could vary by a few cents, quantity could vary by a unit or two, and Date/Time could vary by a minute or two.

 

Example Data Set:

Dataset 1 - Sam's sales

SKUSales Date/TimeItem CategoryPriceQuantity
12342023-10-15 10:52:01A20.02100
56782023-10-15 13:45:05A12.0320
12342023-10-14 16:34:03B9.043

 

Dataset 2 - Sales for all employees

SKUSales Date/TimeItem CategoryPriceQuantity
12342023-10-15 11:05:23A15.03100
12342023-10-15 10:53:01A20.02100
92312023-10-13 13:23:33B9.043
56782023-10-15 13:45:05A12.0320
56782023-10-15 13:45:05B12.0320
12342023-10-14 16:34:03B9.023

 

In the example above, Sam's three trades would most likely be the three records shown in blue in the full employee dataset. They don't match exactly to Sam's sales data, but they are close enough that I could reasonably say they belong to Sam.

What I've tried so far

I've mostly focus on trying to configure the Fuzzy Match tool as a merge with a field that shows where each record came from, and a unique RecordID field. I've set the Match Style to "Exact" for SKU ID and "Item Category."

 

For the DateTime, price, and quantity fields, I've set a Custom Match style that generates keys for Digits Only and a Match Function that is Character (No Spaces): Best of Jaro and Levenshtein Distance. I've tried different match thresholds ranging from 20% - 80%.


What I'm getting

No matter what threshold I set, I keep getting an output that is only able to match ~100 of Sam's sales out of ~16,000. The Match Score for all fields always outputs "100," even though I've changed the threshold to different values.


Any ideas on what I'm doing wrong with the fuzzy match tool, or if I might be able to do this in a different way without Fuzzy Matching?

2 REPLIES 2
Prometheus
12 - Quasar

@NaiLo Since you set the Match Style for SKU and Item Category, you can start by joining on those fields to remove anything that has a 100% match to those fields from the dataset. You can also theoretically solve this issue by creating parameters by which you can filter. In this solution, I created a price parameter of 99 cents and a test to see if the difference between Sam's price and the dataset price is <= the absolute value of 0.99. I also created a quantity parameter of 1 and a DateTime parameter of 1 and tests for each. You can change the parameters to what you feel is best for your dataset. This is by no means the only solution, but it'll get you started in that direction.

NaiLo
8 - Asteroid

@Prometheus This worked well for my use case, thank you so much! The only thing I changed was where you wrapped the Absolute Value function. In your workbook it's wrapped around the parameter rather than the difference calculation.

 

I'm a bit bummed I still haven't found a legitimate use case for the Fuzzy Match tool. One day!

Labels
Top Solution Authors