Hey!
After seeing how active and helpful this forum is i thought i'd come here with my problem.
I have 2 tables in the following format:
Table 1:
Company | OrderNumber | Article |
101 | 101001 | Apple x2 (100g) |
102 | 102001 | Apple x1 (50g) |
102 | 102002 | Banana (extra straight) |
103 | 103001 | Apple |
103 | 103002 | Banana (purple) 2x 80% |
103 | 103003 | Cherry (small, big quantity) 25398 |
Table 2:
Company | OrderNumber | Article |
101 | 101001 | Apples |
102 | 102001 | Bananas |
102 | 102002 | Apples |
103 | 103001 | Bananas |
103 | 103002 | Apples |
103 | 103003 | Berries |
My goal is to match Table 1 with the Articles from Table 2. A perfect result would look like this:
Table 3:
Company | OrderNumber | ArticleTable2 | ArticleTable1 |
101 | 101001 | Apple x2 (100g) | Apples |
102 | 102001 | Apple x1 (50g) | Apples |
102 | 102002 | Banana (extra straight) | Bananas |
103 | 103001 | Apple | Apples |
103 | 103002 | Banana (purple) 2x 80% | Bananas |
103 | 103003 | Cherry (small, big quantity) 25398 | Berries |
So the goal is to accurately group the Articles from Table 1 to their corresponding articles in Table 2. Table 1 and Table 2 are historical data and Table 3 will be used as a set to train a SVM.
The problem i'm facing is, that you will always be able to exactly match Company and OrderNumber, but the articles within the OrderNumber are placed randomly. You can't say that the first article from Table1 Company A and OrderNumber A is the first article for Company A and OrderNumber A in Table 2.
I tried to fuzzy match with an exact match on Company, exact match on OrderNumber and then different matching styles and threshholds to match every article from Table 1 to Table 2. The results are not good enough, no matter how much i tweak the fuzzy match.
What i think would work, but i'm not sure if that is doable in alteryx and the main reason i'm looking for help is the following:
Within an exact match on Company and OrderNumber match each Article from Table 1 the highest matching article from Table 2. While still not perfect, that should lead to an usable result.
I'm thankful for any input.
I think I have a solution for you, but there were a couple issues with your sample data. I assume you created the sample tables in Excel with a fill down on order number because it seems to have incremented on each row of a given company. So my assumption is that the intent was to have a single order number per company with 1 - 3 articles per order number.
The second issue to overcome was that the Article from table 2 is plural (ending with an "s"), but trying to match to the singular form in table 1. I handled this in a Formula tool at the beginning of the workflow. The other 2 inputs & append field tools are just to attach a value ("Table1" & "Table2") to signify which dataset is which for use in the Fuzzy Match tool.
With both of those situations covered, my approach was effectively what you described already trying: "fuzzy match with an exact match on Company, exact match on OrderNumber" and fuzzy match on Article (with thresholds set at 70%). This get us to 5 rows matched.
Then filter out the rows that found a match from the dataset (using Transpose and a Join tool's R anchor) & match again just using only the exact match fields of Company & OrderNumber. This gets the 6th match (even though the Articles are not at all similar).
Finally, Union all the matches & Join back to the original data to get the attribute fields in place.
Thank you very much for your input. I'm currently not able to look into what you've provided but will do so once i'm able to.