Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Regex_Replace [[:punct:]] except certain characters

knozawa
11 - Bolide

Hello,

 

Would it be possible to replace punctuation to blank except specific characters?

 

I understand that we can use:
REGEX_Replace([Field1],"[[:punct:]]", '')

 

to replace the punctuation to blank.

 

However, I would keep following characters:
()-;,.'&

 

I also noticed that the regex formula didn't remove "⊥" into blank although I would like it to be blank.

 

I've attached a sample workflow.

 

Thank you for your help in advance.

 

Sincerely,
knozawa

5 REPLIES 5
Thableaus
17 - Castor
17 - Castor

Hi @knozawa 

 

See if this works:

 

REGEX_Replace([Field1],"[^-a-zA-Z0-9();,.'&]", '')

 

Cheers,

knozawa
11 - Bolide

Hi @Thableaus,

 

Thank you for your quick reply!  It worked well.

 

FYI: I added a space within the formula after '&' to not remove the space between words.

REGEX_Replace([Field1],"[^-a-zA-Z0-9();,.'& ]", '')

 

Sincerely,

knozawa

knozawa
11 - Bolide

Hi @Thableaus ,

 

I actually faced into another issue.

 

It seems like all unicode characters were also removed using the formula:

Clínico --> Clnico

São --> So

Sørlandet --> Srlandet

Linköping --> Linkping

 

There are lots of unicode characters that I don't want to remove.  Do you think I should just list them within the regex replace formula? Or is there any other way to not remove those unicode characters? 

 

I checked this link

Probably we cannot use this formula in Alteryx.

[^[:unicode:]]

But maybe I could use the unicode range instead of listing out all the unicode characters.

 

In that case, do you know how to add unicode ranges within the same regex replace formula?

REGEX_Replace([Field1],"[^-a-zA-Z0-9();,.'& \U+00C0-\U+00D1]", '')

This didn't work.

 

Sincerely,

knozawa

 

Thableaus
17 - Castor
17 - Castor

@knozawa 

 

Try this:

 

REGEX_Replace([Field1],"[^-\w();,.'&\s]", '')

 

\w I think it stands for any digit or letter in Unicode.

 

Cheers,

knozawa
11 - Bolide

Hi @Thableaus ,

 

Thank you very much! Using \w worked well for unicode letters too!

 

Sincerely,
Kazumi

Labels