I am performing a fuzzy match on a company name field that contains many languages, including character languages like Chinese/Japanese. The fuzzy match tool does not like these character languages and will give warnings and sometimes give an error (volume of these warnings?). Does anyone know how to best deal with the situation? I have tried to filter out these records with this formula [ REGEX_MATCH([Name], "^[a-zA-Z1-9 ]+$") ] however, it seems to filter out "normal" English words sporadically as well. Is there a better way to separate and evaluate these records?
@salyerm Can you provide a sample the data you are working with?
@Deano478 Here is some data I constructed that represents some of the character diversity. At this point, I would like to filter out the rows prior that would not be able to be processed by Fuzzy Match (any of the character languages as the fuzzy match will just replace them with question marks: "지니어링주식회사 LTD" -> "???????? LTD")
Company Name |
TST INDUSTRIES ELECTRIQUES |
Тест бизнес-парка |
TESTING TECHNOLOGY LLC 2.0 |
ABC INVESTMENT BANK |
DEPARTMENT OF ECONOMIC DEVELOPMENT 565 |
CONSORCIO CONDUTO ZYX |
지니어링주식회사 LTD |
BUÑO JUAN KZ |
建筑信息有限公司 |
NORTHERN TERRITORY US CO. |
BØKKERVEIEN 123 |
CONSTRUCTION GROUP ELT |
信息有限公司 |
TESTING TECHNOLOGY LLC |
BØKKER123 |
CONSORCIO ZYX |
ABC INVESTMENTS |
Hey @salyerm have a look at the attached and let me know what you think?