Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

How to flag an alphanumeric field if it contains Chinese characters

ckaushik001
6 - Meteoroid

I have a description column in the dataset. The description contains alphabet, numbers and special characters. I just need to flag the descriptions that contains Chinese characters.

IMPORTANT: The description contains characters from other languages as well (like Spanish). I just need to flag Descriptions with Chinese characters.

 

Sample Data:

Fidelización GM01
03651 CHALLENGE CUP GUAM (BR)
POST INTEGRATION (FINANCE)
03255 CUSTOMER SURVEY FIFE
75100 H25 TAX SYSTEM REFORM
50815 FATCA PROJECT
CLAIMS OPTIMIZATION
Descripción Fidelización GM01
Cambio de persona Física
Metlife Sendai Honcho Bldg.
Bóveda virtual
Remediación CFDI 20013
Digi – Lgcy Migration Build
Digi – Hosting/Managed Svcs
Digitisation – Claims Build
Digitisation –Data Mig Build
Digitisation – DPST Run
JP1 保守更新(Security)
SSO運用保守
e-Learning・WFM・SFDC
splunk保守費
ゲート保守費
各ビルメンテナンス(監視カメラ保守/HDD物理破壊)
SCL予備_1
SCL予備_2
SCL予備_3
SCL予備_4

 

Output I am expecting:

Flag_Chinese.jpg

6 REPLIES 6
binuacs
20 - Arcturus

@ckaushik001 One way of doing this with the regex count match function

 

IIF(REGEX_CountMatches([Data], '[^ -~]') > 0, 'Y','N')

binuacs_0-1658788480280.png

 

 

gabrielvilella
14 - Magnetar

Hi @ckaushik001, try this:

 

[ -~]+

 

gabrielvilella_0-1658788378541.png

 

 

Qiu
21 - Polaris
21 - Polaris

@ckaushik001 
Based on this post, we can use "[^\x00-\x7F]" to represent any non-ascii characters which equals to non-english character in this case, I hope😁

https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Regex-non-english-characters/td-p/4059...

0726-ckaushik001.PNG

ckaushik001
6 - Meteoroid

@binuacs , @gabrielvilella @Qiu thanks for the prompt reply guys! This is flagging even the Spanish characters (all non-English characters). I just need to flag descriptions containing Chinese characters. I am updating the sample dataset for your reference. I am sorry I have been trying multiple combinations of REGEX patters but no success so far.. ☹️

binuacs
20 - Arcturus

@ckaushik001 One way of doing this with the Regex_Replace method, you need to add all the Chinese characters in the function like below

 

binuacs_0-1658872044340.png

 

 

Qiu
21 - Polaris
21 - Polaris

@ckaushik001 

I think we can do it by unicode.

 but strangelly, the \x30A0-\x30FF are supposed for Katagana only but somehow it works in alteryx. 
At least it works for your sample data for now.

http://www.unicode.org/charts/

0726-ckaushik001-2.PNG

Labels