\w not behaving as expected
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
According to the documentation of the RegEx tool (and general practice AFAIK) '\w' in RegEx should be a short hand for [A-Za-z0-9_]. That is, all uppercase and lowercase unaccented latin letters, numbers and underscore
However in versions 2023.1+ (at least) \w allows any character from any alphabet.
Is this expected behaviour? Has this always been the case? Is there a setting I'm missing?
Thanks,
Ollie
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I'm on 2021.4 and running your workflow produces the same results for me.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thanks @apathetichell
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
So according to perl's documentation it looks like Alteryx is behaving without the /a modifier in effect. I also noted that there are differences in behaviour with AMP on and off. From Alteryx's documentation this looks like using Unicode rules.
So this has maybe always been the case, but the documentation in the RegEx tool is misleading.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Interesting.
Using the regex101.com website, it looks like the /u modifier causes the expression \w to include alpha-numeric characters from non-Latin languages.
Did you find any way to turn on the /a modifier in Alteryx regex?
 
Chris
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@ChrisTX Unfortunately I don't think we can (de)activate flags in Alteryx's RegEx (other than case insensitivity). Certainly @MarqueeCrew was asking for the ability to change the multiline flag here:
