Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

(Fuzzy?) Match within a specified group

AnRo
6 - Meteoroid

Hey! 

 

After seeing how active and helpful this forum is i thought i'd come here with my problem.

 

I have 2 tables in the following format: 

 

Table 1:

CompanyOrderNumberArticle
101101001Apple x2 (100g)
102102001Apple x1 (50g)
102102002Banana (extra straight)
103103001Apple
103103002Banana (purple) 2x 80%
103103003Cherry (small, big quantity) 25398

 

Table 2:

CompanyOrderNumberArticle
101101001Apples
102102001Bananas
102102002Apples
103103001Bananas
103103002Apples
103103003Berries

 

My goal is to match Table 1 with the Articles from Table 2. A perfect result would look like this:

 

Table 3:

CompanyOrderNumberArticleTable2ArticleTable1
101101001Apple x2 (100g)Apples
102102001Apple x1 (50g)Apples
102102002Banana (extra straight)Bananas
103103001AppleApples
103103002Banana (purple) 2x 80%Bananas
103103003Cherry (small, big quantity) 25398Berries

 

 

So the goal is to accurately group the Articles from Table 1 to their corresponding articles in Table 2. Table 1 and Table 2 are historical data and Table 3 will be used as a set to train a SVM.

 

The problem i'm facing is, that you will always be able to exactly match Company and OrderNumber, but the articles within the OrderNumber are placed randomly. You can't say that the first article from Table1 Company A and OrderNumber A is the first article for Company A and OrderNumber A in Table 2. 

 

I tried to fuzzy match with an exact match on Company, exact match on OrderNumber and then different matching styles and threshholds to match every article from Table 1 to Table 2. The results are not good enough, no matter how much i tweak the fuzzy match.

 

What i think would work, but i'm not sure if that is doable in alteryx and the main reason i'm looking for help is the following:

 

Within an exact match on Company and OrderNumber match each Article from Table 1 the highest matching article from Table 2. While still not perfect, that should lead to an usable result. 

 

I'm thankful for any input.

 

 

 

 

2 REPLIES 2
MattBSlalom
11 - Bolide

I think I have a solution for you, but there were a couple issues with your sample data.  I assume you created the sample tables in Excel with a fill down on order number because it seems to have incremented on each row of a given company.  So my assumption is that the intent was to have a single order number per company with 1 - 3 articles per order number.

The second issue to overcome was that the Article from table 2 is plural (ending with an "s"), but trying to match to the singular form in table 1.  I handled this in a Formula tool at the beginning of the workflow.  The other 2 inputs & append field tools are just to attach a value ("Table1" & "Table2") to signify which dataset is which for use in the Fuzzy Match tool.

 

MattBSlalom_0-1664545216664.png

 

 

With both of those situations covered, my approach was effectively what you described already trying:  "fuzzy match with an exact match on Company, exact match on OrderNumber" and fuzzy match on Article (with thresholds set at 70%).  This get us to 5 rows matched.

Then filter out the rows that found a match from the dataset (using Transpose and a Join tool's R anchor) & match again just using only the exact match fields of Company & OrderNumber.  This gets the 6th match (even though the Articles are not at all similar).

 

Finally, Union all the matches & Join back to the original data to get the attribute fields in place.

AnRo
6 - Meteoroid

Thank you very much for your input. I'm currently not able to look into what you've provided but will do so once i'm able to. 

Labels