We are celebrating the 10-year anniversary of the Alteryx Community! Learn more and join in on the fun here.
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Question: Fuzzy Matching tool for Korean

JNK
5 - Atom

Hello, I'm struggling with Fuzzy Match.

 

The data format is UTF-8(because of it's Korean...) and I've got these error.

 

JNK_0-1622467413062.png

 

Is there anyway to solve this issue and use Fuzzy Match with Korean?

 

Thanks!

1 REPLY 1
DataCurious_Nick
6 - Meteoroid

Hi JNK,

 

Behind the fuzzy matching tool in Alteryx are a number of different algorithms including Jaro and Levelshtein. Unfortunately, Korean (along with Chinese and Japanese) performs very poorly with Levenshtein distance matching because it's pictogram-based rather than alphabet-based.

 

A solution would be to use a romanisation library to translate them to alphabet characters in lowercase? Then you can run the standard Fuzzy Matching on the romanized output? 

 

I'm attaching a small example of this, using a Python library that can be incorporated into your data prep stages before you run the Fuzzy Match. 

 

Hope this helps!
Nick

Labels
Top Solution Authors