community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.

How to idenitify accent character in a string field?

Asteroid

Hi

 

I have to identify strings with accent characters in field city name and results are expected as below.

Please help to idenifty these strings.

 

CIty Name      Output

à                   Invalid                -- Accent & special character

Ã$                    Invalid                 --Accent & special character

ÃBC                 Valid                    -- Accent & alphabets only

ÃBC  ÃBCD     Valid                  -- Accent & alphabets only with space

ÃBC.DEF          Valid                    -- Accent & alphabets only with period

ÃBC-DEF          Valid                   -- Accent & alphabets only with hyphen

 

Asteroid

You could tokenize extended ASCII codes (character code 128-255) with this statement:

 

[^\x00-\x7F]

 

Nebula
Nebula

Hi @Meena 

 

@nakamott was on the right track by specifying a range of characters to look for.  Unfortunately, his range didn't match your requirements

 

The attached workflow uses this in  a Formula tool to meet your criteria

 

 

if REGEX_Match([CIty Name], "[a-zA-Z\.\-\s\u00C0-\u017F]+",0) then
	"Valid"
else
	"Invalid"
endif

 

 

Note that this uses the optional case insensitive parameter "0".  The regex breakdown is this 

 

a-z  "a" to "z"

A-Z "A" to "Z"  this and the preceding are broken into 2 ranges to avoid picking up the non letter characters between them

\.  the period character

\-  the hyphen character

\s space

\u00C0-\u017F unicode characters included in Latin-1 Supplement and Latin Extended-A  You can expand this range as required.

 

This gives the results you're looking for

 

Results.png

 

 

This should cover most of your needs.   Check out this page for a complete breakdown of Unicode characters

 

 

Dan

Asteroid

Hi Dan

 

Thanks for the solution.

The code you shared works as expected for the below input except for the mentioned...Please can you help if I missed anything?

 

 

BEAUCOUZÉ

ÅHUS

Ã_REBRO -- This should be Invalid because of underscore

ÄLMHULT

Ã_REBRO, 

A@B -- This should be invalid because of @
A^B     -- This should be invalid because of ^

ÅHUS

Labels