RegEx using Unicode Blocks Inquiry


While trying to use Unicode Blocks either in RegEx tool  or in RegexCountMatch formula, I am not able to achieve correct results?

The ultimate goal would be to get characters count using the \p{IsLatin} block or variations of other blocks.


Could anybody be so kind to help/share experience in this area, please? Or confirm this syntax character classes are not supported in Alteryx? 


Main resources I have used so far:


Issues using RegEx tool (using only InLatin_Basic for simplicity):

  1. [[:InLatin_Basic:]] -  error: RegEx (#): RegEx: An invalid character class name was specified in a [[:name:]] block at character 3
  2. [:InLatin_Basic:] - no error but incorrect result - i.e. 'e' or 'm'  were not matched

As a workaround, using the unicode range seems to work ok ([\U+0000-\U+007F]) - but this is a bit cumbersome (especially when trying to work with multiple blocks).


Testing workflow is attached.

Many thanks in advance.

Hi @IvanaF


Not sure if I fully understand what your end result should be, but would this expression work within the RegEx tool?



I don't believe the Boost library (which Alteryx uses for its Regex functionality) supports Unicode blocks at present.


The recommended workaround I know is to do what you were with Unicode ranges.


Hi @jrgo, many thanks for the suggestion; it helped me to the find additional resources in Alteryx help and can definitely be used in certain use cases. Thanks again for your time on this.


Hi @jdunkerley79, many thanks too for your time and advice. Cheers!