Hi,
I am looking to use Fuzzy Match on a lookup Field. For example, I have a column with RAW ID codes which I need to map to a master Lookup and create a new column.
Eg :
Is it possible to fuzzy match against the RAW and the LOOKUP field and create a new FINALCODE column?
I've attached the xlsx of the test data if anyone would like to provide a basic workflow.
Many thanks for any help!
Solved! Go to Solution.
Hi @fiorano,
Not sure if it's a use case for fuzzy matching. If my assumption is correct, you're trying to identify if the exact string of characters in your lookup column exist in your RAW field. The reason why I don't think FM is needed is because you need it to find EXACTLY that string.
The "Find Replace" tool (Join tool category) would seem to be the more appropriate tool to use.
I have the tool configured to look for the "Find Value" in any part of the RAW field. If found, it'll append the "lookup" value instead of replacing.
Hope this helps!
Jimmy
Hi Jimmy,
Looks Like I was trying to over complicate things - thanks for this!!!
Out of interest, if for example my lookup has a rogue item in it, eg an extra 0 on the end :
RAW | lookup |
RHGFY950 | RHGFY950 |
RHGFY950 | FGGY178 |
RHGFY950 | WSDHHT87 |
_RHGFY950 | YFHHS412 |
RHGFY950 | HFFUYYR |
RHGFY950 | RJJOODS |
RHGFY950 | LMMJD873 |
RHGFY950X | UHHDO93 |
RHGFY950 | IIHOOS03 |
RHGFY950THISONE | JJOJASS03 |
RHGFY95020 | JSNKFFP67 |
RHGFY950 | HSGHL |
RHGFY950 | IIIEDJJOS64 |
RHGFY950 | HIHISPAH78 |
RHGFY950_1 | UHUGE892 |
RHGFY950 | RHGFY9500 |
Could an extra step be added to your workflow?
Thanks again!
Thanks @fiorano and happy to help!
Generally speaking, you can add whatever steps you need to address issues. For what you mentioned specifically, I'm not sure what action/step you'd like it to take or how you would validate your 'lookup' table to identify "rogue" items.
Hi @jrgo,
Of course - I can see where your coming from.
One last scenario we may come across in our RAW column is has a value very similar (eg a typo where RHGFY950 has a 'Y' missing from it and is RHGF950). Would FM be used prior to the find and replace steps ? I think this would be one of the main issues we could come up against....
RAW | lookup | FINALCODE |
RHGFY950 | RHGFY950 | RHGFY950 |
RHGFY950 | FGGY178 | RHGFY950 |
RHGFY950 | WSDHHT87 | RHGFY950 |
_RHGFY950 | YFHHS412 | RHGFY950 |
RHGFY950 | HFFUYYR | RHGFY950 |
RHGFY950 | RJJOODS | RHGFY950 |
RHGFY950 | LMMJD873 | RHGFY950 |
RHGFY950X | UHHDO93 | RHGFY950 |
RHGFY950 | IIHOOS03 | RHGFY950 |
RHGFY950THISONE | JJOJASS03 | RHGFY950 |
RHGFY95020 | JSNKFFP67 | RHGFY950 |
RHGFY950 | HSGHL | RHGFY950 |
RHGFY950 | IIIEDJJOS64 | RHGFY950 |
RHGFY950 | HIHISPAH78 | RHGFY950 |
RHGFY950_1 | UHUGE892 | RHGFY950 |
RHGF950 | HSGHL | RHGFY950 |
Yes, this is where FM may come into play. However, I'm not sure how effective your matches will be since ID codes aren't really grammatical values, which is what this tool is designed to assist with. Using the sample data set you provided, it seems like it would work, but you'd have to thoroughly validate the results when it's ran against your entire table.
It's first using the find replace tool to find the matches and then filtering to split out the records that did not get matched which would be the subset of records that would run through the FM tool.
If you aren't familiar on how to use the FM tool, I'd strongly suggest looking at the KB and the trainings available in the "Academy" section. Unlike most of the other tools, FM will require a lot of trial and error to configure the tool to generate matches within your acceptable threshold range.
Thansks @jrgo !
This is more along the lines of what I'm after. We have a lot of data coming from multiple sources and with 'dirty' data. The solution you have posted will form the basis of a larger cleansing workflow!
Many thanks for your time and expertise!
HI @jrgo
Sorry to reply to this thread but I have posted another query which is based on this thread here : https://community.alteryx.com/t5/Data-Preparation-Blending/Complex-Cleansing-and-Mapping-anyone-brav... . Just wondering if you could have a quick look and offer any advice if possible?
Many thanks for your time once again.
User | Count |
---|---|
17 | |
15 | |
15 | |
8 | |
5 |