Hi all, for my company we have an database with labelinformation of articles such as ingredients, allegens etc. From this database i want to extract some specific information like kind of fruit.
I created a text list of all kind of fruits and by find and replace i'm getting the right data back, so far so good. Now the challenge is that i missing some words due to spellings errors or abbreviations
For example in my text input there is banana but i now i miss words like ban, banana or bananas. What is the best way to solve this? I heard something about root words but not sure how this works.
Thanks for the help
Solved! Go to Solution.
It sounds like you need to use the fuzzy matching tool.
Join the data using the master list you've created, using the standard join tool. This will remove the ones that match without problems. You can then use the find and replace tool on those.
For the remaining, take the left and right outputs, create a field on each stream called "Source" with the value being "Primary" on the left output, and "Secondary" on the right. Give each one a record ID (start them at different values so they don't use the same numbers).
Union these records (dropping the extra columns if you like. We only need the record ID, source and the text you're trying to match.
Feed this stream into the fuzzy matching tool and configure accordingly (go and read in the help section on Fuzzy Matching). This will allow you to take those records that don't match one-to-one and find the nearest matching record. This is very useful with typos, abbreviations etc.
Hope this helps,
M.
for your help, very usefull. I'm not there yet but a huge step.
Cheers Barry
Thanks M, I will do that, much appreciated
Best regards Barry