This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
I am new to the world of Alteryx and am constantly facing challenges!
My current challenge is in respect to using the fuzzy match, i am trying to use it to find matching (or similar) invoice numbers. I am matching on the RecordID, 1-10 to 51-60.
i seem to be able to get the exact matches just fine, but some of my data has truncated numbers, and maybe some transposed numbers in there as well. Basically, if the numbers look kind of similar at all, i want Alteryx to flag them. Here is kind of what my data looks like:
Reference (Inv# + Store#)
Assignment (Bottler Inv#)
Reference Key 1 (Not Sure??)
Reference Key 2 (Store#)
Amount in local currency
My only technique is to remove the first digit from from the "Reference (Inv# + Store#)" with the 1400 source after running a fuzzy match and doing it again and deleting another digit.
Hopefully this makes sense, any help would be greatly appreciated!!
40% is a very low threshold to run on a first pass, I think. What is your key length? Can you post the Match Options configuration screen? It's not looking like your data is actually complete enough to make any meaningful matches. Can you also explain why you're looking for such a nebulous match?
I started at the default threshold and i got 4 matches from my original data. these were all exact matches but i was hoping for Alteryx to be able to look at the numbers and at least give me a number/percentage to say that it could, maybe be a match.
I have different data from customers than what is in our work systems and some of it is hand keyed so there are always some errors. Other times the system may take off some of the first digits.
These are all invoice numbers, many of them have a consistent length. The largest ones are 15 characters (mostly digits, sometimes some letters in there), down to about 8 characters. the data has about 100,000 lines of data or so, so the idea is to find some that are a close match and then manually go over them to determine if they actually are.
Yesterday was my first attempt using the fuzzy match, so I am really unsure how to use it, it seems better suited for more consistent data like names, addresses, and phone numbers.
A couple things jump out to me. First of all you're not using a matching algorithm at all, so you're only able to get exact key matches. Second, you're not generating any keys, and since you're using a length longer than your longest number, it becomes a standard join. Third, Double Metaphone with digits tries to "pronounce" the "word" and then matches the numbers. Since all you have is numbers, it also won't help you.
I took your sample and ran a Character Jaro Distance and only got one match with a key length of 4, but because of the nature of this dataset I doubt fuzzy will get you a great result. Because the numbers are in a block, I think that Fuzzy Match may be less powerful than other methods. It's a bit more art than science, so there's always some tuning, but your use case probably has a better way to associate the data.
My first thought is to look for corroborating data that could be used to associate the numbers, and find a join method appropriate for whatever that data is. Perhaps names, products, service codes, city/state/zip or a combination of those are available. If it were me, I'd look for more metadata on each of these to try to join on.
If you can, for example, just get the location code isolated and convert it to VString then you could do a Jaro fuzzy to see if there were transposed characters ('41223' typed in as '42123') or substitution characters typed in erroneously ('41223' is typed in as '41222'). It may also help to read a bit about how the fuzzy algorithms work, because that can inform your decision a ton.