Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

DecomposeUnicodeForMatch Character List

djaipras
7 - Meteor

Is there an exact list of characters that the DecomposeUnicodeForMatch removed accents from and expands?

 

It does not appear to be all characters in the UFT-8 character set. Seeing instances where it's transliterating wildly, e.g., 

 

Similar instances not handled:

LATIN CAPITAL LETTER AE WITH MACRON (U+01E2) c7a2 is converted to "ae"

but

LATIN SMALL LETTER AE (U+00E6) c3a6 is not converted to "ae" 

 

Non accented characters not handled uniformly:

LATIN SMALL LETTER LONG S (U+017F) c5bf is converted to "s"

but 

LATIN SMALL LETTER O WITH STROKE (U+00F8) c3b8 is not converted to "o"

 

 

I know there are limitations with the function, but it's unclear what it is converting and what the limitations are. The description for the function only states "non-western" characters sets.

 

Is there a list of the exact character sets that it's limited to?

0 REPLIES 0
Labels