Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

RegEx using Unicode Blocks Inquiry

IvanaF
5 - Atom

While trying to use Unicode Blocks either in RegEx tool  or in RegexCountMatch formula, I am not able to achieve correct results?

The ultimate goal would be to get characters count using the \p{IsLatin} block or variations of other blocks.

 

Could anybody be so kind to help/share experience in this area, please? Or confirm this syntax character classes are not supported in Alteryx? 

 

Main resources I have used so far:

 

Issues using RegEx tool (using only InLatin_Basic for simplicity):

  1. [[:InLatin_Basic:]] -  error: RegEx (#): RegEx: An invalid character class name was specified in a [[:name:]] block at character 3
  2. [:InLatin_Basic:] - no error but incorrect result - i.e. 'e' or 'm'  were not matched

As a workaround, using the unicode range seems to work ok ([\U+0000-\U+007F]) - but this is a bit cumbersome (especially when trying to work with multiple blocks).

 

Testing workflow is attached.

Many thanks in advance.

4 REPLIES 4
jrgo
14 - Magnetar

Hi @IvanaF

 

Not sure if I fully understand what your end result should be, but would this expression work within the RegEx tool?

[^[:unicode:]]

image.png

jdunkerley79
ACE Emeritus
ACE Emeritus

I don't believe the Boost library (which Alteryx uses for its Regex functionality) supports Unicode blocks at present.

 

The recommended workaround I know is to do what you were with Unicode ranges.

IvanaF
5 - Atom

Hi @jrgo, many thanks for the suggestion; it helped me to the find additional resources in Alteryx help and can definitely be used in certain use cases. Thanks again for your time on this.

IvanaF
5 - Atom

Hi @jdunkerley79, many thanks too for your time and advice. Cheers!

Labels