Alteryx Designer Desktop Discussions

salyerm · ‎07-29-2024

I am performing a fuzzy match on a company name field that contains many languages, including character languages like Chinese/Japanese. The fuzzy match tool does not like these character languages and will give warnings and sometimes give an error (volume of these warnings?). Does anyone know how to best deal with the situation? I have tried to filter out these records with this formula [ REGEX_MATCH([Name], "^[a-zA-Z1-9 ]+$") ] however, it seems to filter out "normal" English words sporadically as well. Is there a better way to separate and evaluate these records?

Deano478 · ‎08-19-2024

@salyerm Can you provide a sample the data you are working with?

salyerm · ‎08-19-2024

@Deano478 Here is some data I constructed that represents some of the character diversity. At this point, I would like to filter out the rows prior that would not be able to be processed by Fuzzy Match (any of the character languages as the fuzzy match will just replace them with question marks: "지니어링주식회사 LTD" -> "???????? LTD")

Company Name

TST INDUSTRIES ELECTRIQUES

Тест бизнес-парка

TESTING TECHNOLOGY LLC 2.0

ABC INVESTMENT BANK

DEPARTMENT OF ECONOMIC DEVELOPMENT 565

CONSORCIO CONDUTO ZYX

지니어링주식회사 LTD

BUÑO JUAN KZ

建筑信息有限公司

NORTHERN TERRITORY US CO.

BØKKERVEIEN 123

CONSTRUCTION GROUP ELT

信息有限公司

TESTING TECHNOLOGY LLC

BØKKER123

CONSORCIO ZYX

ABC INVESTMENTS

Deano478 · ‎08-21-2024

Hey @salyerm have a look at the attached and let me know what you think?

Alteryx Designer Desktop Discussions

Fuzzy Match/Filtering out Chinese/Japanese/Character Languages