Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

RegEx using Unicode Blocks Inquiry

IvanaF
5 - Atom

While trying to use Unicode Blocks either in RegEx tool  or in RegexCountMatch formula, I am not able to achieve correct results?

The ultimate goal would be to get characters count using the \p{IsLatin} block or variations of other blocks.

 

Could anybody be so kind to help/share experience in this area, please? Or confirm this syntax character classes are not supported in Alteryx? 

 

Main resources I have used so far:

 

Issues using RegEx tool (using only InLatin_Basic for simplicity):

  1. [[:InLatin_Basic:]] -  error: RegEx (#): RegEx: An invalid character class name was specified in a [[:name:]] block at character 3
  2. [:InLatin_Basic:] - no error but incorrect result - i.e. 'e' or 'm'  were not matched

As a workaround, using the unicode range seems to work ok ([\U+0000-\U+007F]) - but this is a bit cumbersome (especially when trying to work with multiple blocks).

 

Testing workflow is attached.

Many thanks in advance.

4 REPLIES 4
jrgo
14 - Magnetar

Hi @IvanaF

 

Not sure if I fully understand what your end result should be, but would this expression work within the RegEx tool?

[^[:unicode:]]

image.png

jdunkerley79
ACE Emeritus
ACE Emeritus

I don't believe the Boost library (which Alteryx uses for its Regex functionality) supports Unicode blocks at present.

 

The recommended workaround I know is to do what you were with Unicode ranges.

IvanaF
5 - Atom

Hi @jrgo, many thanks for the suggestion; it helped me to the find additional resources in Alteryx help and can definitely be used in certain use cases. Thanks again for your time on this.

IvanaF
5 - Atom

Hi @jdunkerley79, many thanks too for your time and advice. Cheers!

Labels