Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Fuzzy Match/Filtering out Chinese/Japanese/Character Languages

salyerm
5 - Atom

I am performing a fuzzy match on a company name field that contains many languages, including character languages like Chinese/Japanese. The fuzzy match tool does not like these character languages and will give warnings and sometimes give an error (volume of these warnings?). Does anyone know how to best deal with the situation? I have tried to filter out these records with this formula [ REGEX_MATCH([Name], "^[a-zA-Z1-9 ]+$") ] however, it seems to filter out "normal" English words sporadically as well. Is there a better way to separate and evaluate these records?

3 REPLIES 3
Deano478
12 - Quasar

@salyerm Can you provide a sample the data you are working with?

salyerm
5 - Atom

@Deano478 Here is some data I constructed that represents some of the character diversity. At this point, I would like to filter out the rows prior that would not be able to be processed by Fuzzy Match (any of the character languages as the fuzzy match will just replace them with question marks: "지니어링주식회사 LTD" -> "???????? LTD")

 

Company Name
TST INDUSTRIES ELECTRIQUES
Тест бизнес-парка
TESTING TECHNOLOGY LLC 2.0
ABC INVESTMENT BANK
DEPARTMENT OF ECONOMIC DEVELOPMENT 565
CONSORCIO CONDUTO ZYX
지니어링주식회사 LTD
BUÑO JUAN KZ
建筑信息有限公司
NORTHERN TERRITORY US CO.
BØKKERVEIEN 123
CONSTRUCTION GROUP ELT
信息有限公司
TESTING TECHNOLOGY LLC
BØKKER123
CONSORCIO ZYX
ABC INVESTMENTS
Deano478
12 - Quasar

Hey @salyerm have a look at the attached and let me know what you think?

Labels