Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

\w not behaving as expected

OllieClarke
15 - Aurora
15 - Aurora

According to the documentation of the RegEx tool (and general practice AFAIK) '\w' in RegEx should be a short hand for [A-Za-z0-9_]. That is, all uppercase and lowercase unaccented latin letters, numbers and underscore

image.png

 

However in versions 2023.1+ (at least) \w allows any character from any alphabet.

image.png

 

Is this expected behaviour? Has this always been the case? Is there a setting I'm missing?

 

Thanks,

 

Ollie

 
5 REPLIES 5
apathetichell
19 - Altair

I'm on 2021.4 and running your workflow produces the same results for me.

OllieClarke
15 - Aurora
15 - Aurora

Thanks @apathetichell

OllieClarke
15 - Aurora
15 - Aurora

So according to perl's documentation it looks like Alteryx is behaving without the /a modifier in effect. I also noted that there are differences in behaviour with AMP on and off. From Alteryx's documentation this looks like using Unicode rules.

So this has maybe always been the case, but the documentation in the RegEx tool is misleading.

image.png

ChrisTX
16 - Nebula
16 - Nebula

Interesting.

 

Using the regex101.com website, it looks like the /u modifier causes the expression \w to include alpha-numeric characters from non-Latin languages.

 

Did you find any way to turn on the /a modifier in Alteryx regex?

 

Screenshot 2024-02-14 122551.png

Chris

OllieClarke
15 - Aurora
15 - Aurora

@ChrisTX Unfortunately I don't think we can (de)activate flags in Alteryx's RegEx (other than case insensitivity). Certainly @MarqueeCrew was asking for the ability to change the multiline flag here: 

https://community.alteryx.com/t5/Alteryx-Designer-Desktop-Ideas/Support-RegEx-Multiline-Flag/idi-p/1... 

Labels