Alteryx Designer Desktop Discussions

JNK · ‎05-31-2021

Hello, I'm struggling with Fuzzy Match.

The data format is UTF-8(because of it's Korean...) and I've got these error.

Is there anyway to solve this issue and use Fuzzy Match with Korean?

Thanks!

DataCurious_Nick · ‎06-06-2021

Hi JNK,

Behind the fuzzy matching tool in Alteryx are a number of different algorithms including Jaro and Levelshtein. Unfortunately, Korean (along with Chinese and Japanese) performs very poorly with Levenshtein distance matching because it's pictogram-based rather than alphabet-based.

A solution would be to use a romanisation library to translate them to alphabet characters in lowercase? Then you can run the standard Fuzzy Matching on the romanized output?

I'm attaching a small example of this, using a Python library that can be incorporated into your data prep stages before you run the Fuzzy Match.

Hope this helps!
Nick

Alteryx Designer Desktop Discussions

Question: Fuzzy Matching tool for Korean