Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Question: Fuzzy Matching tool for Korean

JNK
5 - Atom

Hello, I'm struggling with Fuzzy Match.

 

The data format is UTF-8(because of it's Korean...) and I've got these error.

 

JNK_0-1622467413062.png

 

Is there anyway to solve this issue and use Fuzzy Match with Korean?

 

Thanks!

1 REPLY 1
DataCurious_Nick
6 - Meteoroid

Hi JNK,

 

Behind the fuzzy matching tool in Alteryx are a number of different algorithms including Jaro and Levelshtein. Unfortunately, Korean (along with Chinese and Japanese) performs very poorly with Levenshtein distance matching because it's pictogram-based rather than alphabet-based.

 

A solution would be to use a romanisation library to translate them to alphabet characters in lowercase? Then you can run the standard Fuzzy Matching on the romanized output? 

 

I'm attaching a small example of this, using a Python library that can be incorporated into your data prep stages before you run the Fuzzy Match. 

 

Hope this helps!
Nick

Labels