Hello, I'm struggling with Fuzzy Match.
The data format is UTF-8(because of it's Korean...) and I've got these error.
Is there anyway to solve this issue and use Fuzzy Match with Korean?
Thanks!
Hi JNK,
Behind the fuzzy matching tool in Alteryx are a number of different algorithms including Jaro and Levelshtein. Unfortunately, Korean (along with Chinese and Japanese) performs very poorly with Levenshtein distance matching because it's pictogram-based rather than alphabet-based.
A solution would be to use a romanisation library to translate them to alphabet characters in lowercase? Then you can run the standard Fuzzy Matching on the romanized output?
I'm attaching a small example of this, using a Python library that can be incorporated into your data prep stages before you run the Fuzzy Match.
Hope this helps!
Nick