We are celebrating the 10-year anniversary of the Alteryx Community! Learn more and join in on the fun here.
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

DecomposeUnicodeForMatch Character List

djaipras
7 - Meteor

Is there an exact list of characters that the DecomposeUnicodeForMatch removed accents from and expands?

 

It does not appear to be all characters in the UFT-8 character set. Seeing instances where it's transliterating wildly, e.g., 

 

Similar instances not handled:

LATIN CAPITAL LETTER AE WITH MACRON (U+01E2) c7a2 is converted to "ae"

but

LATIN SMALL LETTER AE (U+00E6) c3a6 is not converted to "ae" 

 

Non accented characters not handled uniformly:

LATIN SMALL LETTER LONG S (U+017F) c5bf is converted to "s"

but 

LATIN SMALL LETTER O WITH STROKE (U+00F8) c3b8 is not converted to "o"

 

 

I know there are limitations with the function, but it's unclear what it is converting and what the limitations are. The description for the function only states "non-western" characters sets.

 

Is there a list of the exact character sets that it's limited to?

0 REPLIES 0
Labels
Top Solution Authors