Bring your best ideas to the AI Use Case Contest! Enter to win 40 hours of expert engineering support and bring your vision to life using the powerful combination of Alteryx + AI. Learn more now, or go straight to the submission form.
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

DecomposeUnicodeForMatch Character List

djaipras
7 - Meteor

Is there an exact list of characters that the DecomposeUnicodeForMatch removed accents from and expands?

 

It does not appear to be all characters in the UFT-8 character set. Seeing instances where it's transliterating wildly, e.g., 

 

Similar instances not handled:

LATIN CAPITAL LETTER AE WITH MACRON (U+01E2) c7a2 is converted to "ae"

but

LATIN SMALL LETTER AE (U+00E6) c3a6 is not converted to "ae" 

 

Non accented characters not handled uniformly:

LATIN SMALL LETTER LONG S (U+017F) c5bf is converted to "s"

but 

LATIN SMALL LETTER O WITH STROKE (U+00F8) c3b8 is not converted to "o"

 

 

I know there are limitations with the function, but it's unclear what it is converting and what the limitations are. The description for the function only states "non-western" characters sets.

 

Is there a list of the exact character sets that it's limited to?

0 REPLIES 0
Labels
Top Solution Authors