Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Filtering Out Bad Data

brianvtran
7 - Meteor

Hi, 

 

I am currently trying to run some quality control on a database with some bad data. There are integers 0 - 9 found in First Name and Last Name fields where there shouldn't be, for example 342 4324 242 3424 or 1414ADGE2342AOPWO when all the records should be straightforward names. Other cases will include random punctuation marks as well and I wanted to know if there's any good way to identify all the anomalies in my database. 

 

So far, I've been writing every possibility I can think of in a formula tool, marking any record that contains an integer or a strange punctuation as "invalid" and have the rest return as "valid." But as one can imagine, trying to account for every instance of an edge case is becoming quite verbose, my expression has 30+ lines. Is there a more sophisticated or cleaner way of doing this? 

 

Thanks

2 REPLIES 2
FilipR
11 - Bolide

Try this:

 

IF REGEX_CountMatches([Field1], '\d') > 0 THEN 'Invalid' ELSE 'valid' ENDIF

 

And you can go one step further and just replace the numbers with nothing:

 

REGEX_Replace([Field1], '\d','')

DataNath
17 - Castor

Hey @brianvtran, I've admittedly assumed that all names here are English or will at least only contain English characters. Off the top of my head, the only punctuation you should expect to see is an apostrophe or hyphen and so the following expression basically checks how many characters in the string don't match: letter, space, hyphen, apostrophe. If this is more than 0, it gets flagged and if not it is read as fine. If you have extra punctuation you want to add then you just need to add them to the group (inside the square brackets):

 

IF REGEX_CountMatches([Name], "[^'a-zA-Z\s-]+") > 0 then 'Invalid' else 'Fine' endif

 

DataNath_0-1659619054553.png

 

This isn't fully foolproof though as I'd imagine there may be some instances where an apostrophe/hyphen is the offending character where it shouldn't be. For those cases, I'm not sure how you could realistically check/differentiate and may be a one for manual review.

Labels