This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
I have to identify strings with accent characters in field city name and results are expected as below.
Please help to idenifty these strings.
CIty Name Output
Ãƒ Invalid -- Accent & special character
Ã$ Invalid --Accent & special character
ÃBC Valid -- Accent & alphabets only
ÃBC ÃBCD Valid -- Accent & alphabets only with space
ÃBC.DEF Valid -- Accent & alphabets only with period
ÃBC-DEF Valid -- Accent & alphabets only with hyphen
You could tokenize extended ASCII codes (character code 128-255) with this statement:
@nakamott was on the right track by specifying a range of characters to look for. Unfortunately, his range didn't match your requirements
The attached workflow uses this in a Formula tool to meet your criteria
if REGEX_Match([CIty Name], "[a-zA-Z\.\-\s\u00C0-\u017F]+",0) then
Note that this uses the optional case insensitive parameter "0". The regex breakdown is this
a-z "a" to "z"
A-Z "A" to "Z" this and the preceding are broken into 2 ranges to avoid picking up the non letter characters between them
\. the period character
\- the hyphen character
\u00C0-\u017F unicode characters included in Latin-1 Supplement and Latin Extended-A You can expand this range as required.
This gives the results you're looking for
This should cover most of your needs. Check out this page for a complete breakdown of Unicode characters
Thanks for the solution.
The code you shared works as expected for the below input except for the mentioned...Please can you help if I missed anything?
Ã_REBRO -- This should be Invalid because of underscore