Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

search field for all matches of thousands of possibles

markstephens
7 - Meteor

I need a way to find ALL matches in ONE string field that MAY contain personal data.

 

Ive got to redact personal data from a field that may contain names, postcodes, phone nums etc

The workflow checks for each of these using regex and a file of first names.

If theres a potential match, then a field is flagged with the content matched eg "Susan" and the message "check this" for a person to then check that the data is in fact personal and to be redacted.

This is because there are so many first names that are also things like "Autumn" or "St" which are not names but references to streets or seasons etc and these shouldnt be redacted.

Anyway, the match for first names uses find/replace tools that append the match as a column

But it only finds and appends on the FIRST match. I need a way to find ALL matches in a field..

for example "provide support to Susan and Mary" should find Susan and Mary, but at moment only finds the very first match, Susan.

Its got to be able to find a match for thousands of potential first names

 

Any ideas?

 

7 REPLIES 7
Jonathan-Sherman
15 - Aurora
15 - Aurora

Rather than simply using Regex to flag up potential sensitive data you could replace it with 'XXX' for example:

 

REGEX_Replace([String], '[A-Z][a-z]+', 'XXX', 0)

 

This would convert "provide support to Susan and Mary" to "provide support to XXX and XXX"

 

If this solves your issue please mark the answer as correct, if not let me know! I've attached my workbook for you to download if needed.

 

Regards,

Jonathan

 

 

OllieClarke
15 - Aurora
15 - Aurora

Hi @markstephens here's what I came up with - use the find and replace tool to replace the relevant part of the string with something distinctive, which you can then tokenise to rows. Then cross tab it to count how many flags appear in each string. 

OllieClarke_0-1574078624932.png

Hope that helps, Ollie

markstephens
7 - Meteor

hi, thanks for that, its raising some good approaches i hadnt thought of.

problem is ive got a column for every single match now! so if one record out of the 15,000 matches "Carol" than the column is there in every record.

i should have said, the redaction is so the personal data can be identified and then redacted before publishing online in the original format but with personal data redacted.

 

markstephens
7 - Meteor

thanks for that, but i should have been clearer, the string could contain ANYTHING, theres no fixed format, its free-form text designed to describe briefly what an invoice is for

 

OllieClarke
15 - Aurora
15 - Aurora

Ah @markstephens, how about this then, which gives you a row for each match rather than a column. This means that your original data could be duplicated, but you won't have the extra columns with information in them.

OllieClarke_0-1574085159435.png

 

edit: attached the correct file this time

markstephens
7 - Meteor

THATS THE ONE!

ive added a summarise tool at the end to group by record ID, and concatenate the matches, so theres one record with field with all the matches in, this is perfect, thanks a lot!

OllieClarke
15 - Aurora
15 - Aurora

@markstephens Nice! Glad I could help 🙂

Labels