Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Field should contain either alphabets A-Z, numbers 0-9, and certain special characters

JHAROHIT
6 - Meteoroid

Hi,

 

I have a requirement to check in a field that it should contain only alphabets A -Z (both upper and lower case) , numbers 0 - 9, symbols like   .,-,\ and latin set  like ÀÈÌÒÙàè.

 

Can someone please guide how should i be writing the regex expression and also explain the syntax.

 

Appreciate your help in advance, Thank you.

7 REPLIES 7
cjaneczko
13 - Pulsar

Try the following in a formula tool.

 

REGEX_Match([Field1], '.*[A-Za-z0-9.,\-\\ÀÈÌÒÙàè].*')
OllieClarke
15 - Aurora
15 - Aurora

Hi @JHAROHIT you might need to be a bit more specific here. RegEx is very precise.

 

For instance, in Alteryx the token '\w' contains all uppercase and lowercase latin letters, all numbers, underscores, and all letters from non-latin alphabets (e.g. greek, sanskrit etc)

 

If that's okay with you, and you don't want underscores then the following filter should work:

 

REGEX_Match([Field],'[\w.\-\\]+') AND !CONTAINS([Field],'_')


You can construct a set of ranges of acceptable characters (escaping those with RegEx meanings). This would only match unaccented latin letters, numbers, fullstops, hyphens and backslashes.

 

[A-Za-z0-9\.\-\\]+

 

 

You could continue to add accented characters to this set, either by inputting them as a list, or putting in the first and last characters in their unicode range (as with A-Z). These ranges are easily found on ASCII tables, or Wikipedia, or google, but as I say we need to know exactly which character should be allowed and which ones shouldn't

 

Hope that helps,

 

Ollie

 

JHAROHIT
6 - Meteoroid

Thank you, i tried it in a sample dataset.

 

My code in the formula tool is as below: 

If

REGEX_Match([Field 1], '.*[A-Za-z0-9.,\-\\ÀÈÌÒÙàèìòùÁÉÍÓÚÝáéíóúýÂÊÎÔÛâêîôûÃÑÕãñõÄËÏÖÜŸäëïöüŸ¡¿çÇŒœßØøÅåÆæÞþÐð].*') = 0
then
'F'
else
'T'
endif

 

However, i tried to insert a '$' between the allowed characters and it did not catch it.

 
Field 1ValidityValidity_1
1234-1T
abcdefg-1T
1-2-/4/5/,-1T
&%$#@0F
ÇŒœßØø-1T
abc$12345-1T
123ÀÈÌÒÙàèìòùÁ3456-1T

 

 
 

Although i have not coded '$' in the allowed characters.

 

 

 

JHAROHIT
6 - Meteoroid

Thank you Ollie,

 

I think now i now understand how it works.

 

However, special character in middle of the allowed characters, that is not getting caught in the exception. Here i have inserted two characters '$' and 'ݿ'.

While passed separately it works but not in mid of allowed characters.

 

My code:

 

If

REGEX_Match([Field 1], '.*[A-Za-z0-9.,\-\\ÀÈÌÒÙàèìòùÁÉÍÓÚÝáéíóúýÂÊÎÔÛâêîôûÃÑÕãñõÄËÏÖÜŸäëïöüŸ¡¿çÇŒœßØøÅåÆæÞþÐð].*') = 0
then
'F'
else
'T'
endif

 

 

Sample Input and output:

Field 1ValidityValidity_1
1234-1T
abcdefg-1T
1-2-/4/5/,-1T
&%$#@0F
ÇŒœßØø-1T
$0F
abc$12345-1T
123ÀÈÌÒÙàèìòùÁ3456-1T
abcݿ123445-1T
qbc12345-1T
ݿ0F
OllieClarke
15 - Aurora
15 - Aurora

Hi @JHAROHIT 

 

REGEX_MATCH() will always compare to the whole string.

 

If you want to only allow strings with those characters then

 

REGEX_MATCH([Field],'[A-Za-z0-9\.\,\-\\ÀÈÌÒÙàèìòùÁÉÍÓÚÝáéíóúýÂÊÎÔÛâêîôûÃÑÕãñõÄËÏÖÜŸäëïöüŸ¡¿çÇŒœßØøÅåÆæÞþÐð]+')

will return true for those, and false if a string contains at least 1 other character

Your '.*' at the beginning and end of your match means that your current formula returns true if there is at least 1 character from the set you created

 

Does that help/make sense?

 

Ollie

cjaneczko
13 - Pulsar

How about this.

 

If

REGEX_Match([Field 1], '.*[A-Za-z0-9.,$\\\-ݿÀÈÌÒÙàèìòùÁÉÍÓÚÝáéíóúýÂÊÎÔÛâêîôûÃÑÕãñõÄËÏÖÜŸäëïöüŸ¡¿çÇŒœßØøÅåÆæÞþÐð].*') = 0
then
'F'
else
'T'
endif
JHAROHIT
6 - Meteoroid

Thank a lott, Ollie.

 

That helps and solves my issue.

 

Appreciate it.

Labels