Hi,
I am currently trying to run some quality control on a database with some bad data. There are integers 0 - 9 found in First Name and Last Name fields where there shouldn't be, for example 342 4324 242 3424 or 1414ADGE2342AOPWO when all the records should be straightforward names. Other cases will include random punctuation marks as well and I wanted to know if there's any good way to identify all the anomalies in my database.
So far, I've been writing every possibility I can think of in a formula tool, marking any record that contains an integer or a strange punctuation as "invalid" and have the rest return as "valid." But as one can imagine, trying to account for every instance of an edge case is becoming quite verbose, my expression has 30+ lines. Is there a more sophisticated or cleaner way of doing this?
Thanks
Try this:
IF REGEX_CountMatches([Field1], '\d') > 0 THEN 'Invalid' ELSE 'valid' ENDIF
And you can go one step further and just replace the numbers with nothing:
REGEX_Replace([Field1], '\d','')
Hey @brianvtran, I've admittedly assumed that all names here are English or will at least only contain English characters. Off the top of my head, the only punctuation you should expect to see is an apostrophe or hyphen and so the following expression basically checks how many characters in the string don't match: letter, space, hyphen, apostrophe. If this is more than 0, it gets flagged and if not it is read as fine. If you have extra punctuation you want to add then you just need to add them to the group (inside the square brackets):
IF REGEX_CountMatches([Name], "[^'a-zA-Z\s-]+") > 0 then 'Invalid' else 'Fine' endif
This isn't fully foolproof though as I'd imagine there may be some instances where an apostrophe/hyphen is the offending character where it shouldn't be. For those cases, I'm not sure how you could realistically check/differentiate and may be a one for manual review.