Is there an exact list of characters that the DecomposeUnicodeForMatch removed accents from and expands?
It does not appear to be all characters in the UFT-8 character set. Seeing instances where it's transliterating wildly, e.g.,
Similar instances not handled:
LATIN CAPITAL LETTER AE WITH MACRON (U+01E2) c7a2 is converted to "ae"
but
LATIN SMALL LETTER AE (U+00E6) c3a6 is not converted to "ae"
Non accented characters not handled uniformly:
LATIN SMALL LETTER LONG S (U+017F) c5bf is converted to "s"
but
LATIN SMALL LETTER O WITH STROKE (U+00F8) c3b8 is not converted to "o"
I know there are limitations with the function, but it's unclear what it is converting and what the limitations are. The description for the function only states "non-western" characters sets.
Is there a list of the exact character sets that it's limited to?