Hi,
I have a requirement to check in a field that it should contain only alphabets A -Z (both upper and lower case) , numbers 0 - 9, symbols like .,-,\ and latin set like ÀÈÌÒÙàè.
Can someone please guide how should i be writing the regex expression and also explain the syntax.
Appreciate your help in advance, Thank you.
Solved! Go to Solution.
Try the following in a formula tool.
REGEX_Match([Field1], '.*[A-Za-z0-9.,\-\\ÀÈÌÒÙàè].*')
Hi @JHAROHIT you might need to be a bit more specific here. RegEx is very precise.
For instance, in Alteryx the token '\w' contains all uppercase and lowercase latin letters, all numbers, underscores, and all letters from non-latin alphabets (e.g. greek, sanskrit etc)
If that's okay with you, and you don't want underscores then the following filter should work:
REGEX_Match([Field],'[\w.\-\\]+') AND !CONTAINS([Field],'_')
You can construct a set of ranges of acceptable characters (escaping those with RegEx meanings). This would only match unaccented latin letters, numbers, fullstops, hyphens and backslashes.
[A-Za-z0-9\.\-\\]+
You could continue to add accented characters to this set, either by inputting them as a list, or putting in the first and last characters in their unicode range (as with A-Z). These ranges are easily found on ASCII tables, or Wikipedia, or google, but as I say we need to know exactly which character should be allowed and which ones shouldn't
Hope that helps,
Ollie
Thank you, i tried it in a sample dataset.
My code in the formula tool is as below:
If
REGEX_Match([Field 1], '.*[A-Za-z0-9.,\-\\ÀÈÌÒÙàèìòùÁÉÍÓÚÝáéíóúýÂÊÎÔÛâêîôûÃÑÕãñõÄËÏÖÜŸäëïöüŸ¡¿çÇŒœßØøÅåÆæÞþÐð].*') = 0
then
'F'
else
'T'
endif
However, i tried to insert a '$' between the allowed characters and it did not catch it.
Field 1 | Validity | Validity_1 |
1234 | -1 | T |
abcdefg | -1 | T |
1-2-/4/5/, | -1 | T |
&%$#@ | 0 | F |
ÇŒœßØø | -1 | T |
abc$12345 | -1 | T |
123ÀÈÌÒÙàèìòùÁ3456 | -1 | T |
Although i have not coded '$' in the allowed characters.
Thank you Ollie,
I think now i now understand how it works.
However, special character in middle of the allowed characters, that is not getting caught in the exception. Here i have inserted two characters '$' and 'ݿ'.
While passed separately it works but not in mid of allowed characters.
My code:
If
REGEX_Match([Field 1], '.*[A-Za-z0-9.,\-\\ÀÈÌÒÙàèìòùÁÉÍÓÚÝáéíóúýÂÊÎÔÛâêîôûÃÑÕãñõÄËÏÖÜŸäëïöüŸ¡¿çÇŒœßØøÅåÆæÞþÐð].*') = 0
then
'F'
else
'T'
endif
Sample Input and output:
Field 1 | Validity | Validity_1 |
1234 | -1 | T |
abcdefg | -1 | T |
1-2-/4/5/, | -1 | T |
&%$#@ | 0 | F |
ÇŒœßØø | -1 | T |
$ | 0 | F |
abc$12345 | -1 | T |
123ÀÈÌÒÙàèìòùÁ3456 | -1 | T |
abcݿ123445 | -1 | T |
qbc12345 | -1 | T |
ݿ | 0 | F |
Hi @JHAROHIT
REGEX_MATCH() will always compare to the whole string.
If you want to only allow strings with those characters then
REGEX_MATCH([Field],'[A-Za-z0-9\.\,\-\\ÀÈÌÒÙàèìòùÁÉÍÓÚÝáéíóúýÂÊÎÔÛâêîôûÃÑÕãñõÄËÏÖÜŸäëïöüŸ¡¿çÇŒœßØøÅåÆæÞþÐð]+')
will return true for those, and false if a string contains at least 1 other character
Your '.*' at the beginning and end of your match means that your current formula returns true if there is at least 1 character from the set you created
Does that help/make sense?
Ollie
How about this.
If
REGEX_Match([Field 1], '.*[A-Za-z0-9.,$\\\-ݿÀÈÌÒÙàèìòùÁÉÍÓÚÝáéíóúýÂÊÎÔÛâêîôûÃÑÕãñõÄËÏÖÜŸäëïöüŸ¡¿çÇŒœßØøÅåÆæÞþÐð].*') = 0
then
'F'
else
'T'
endif
Thank a lott, Ollie.
That helps and solves my issue.
Appreciate it.