In an attempt to anonymize my data, I'm using a multi-field formula with REGEX to replace customer names in the data, and a dynamic rename with REGEX to replace customer names in field headers, like this:
Multi-Field:
IF
findstring([_CurrentField_], “Cust1”)!=-1 then
REGEX_Replace([_CurrentField_], “Cust1”, "John")
elseif findstring([_CurrentField_], “Cust1”)!=-1 then
REGEX_Replace([_CurrentField_], “Cust1”, "Paul")
else [_CurrentField_] endif
Dynamic Rename:
IF
findstring([_CurrentField_], "Cust1")!=-1 then
REGEX_Replace([_CurrentField_], "Cust1", "John")
elseif findstring([_CurrentField_], "Cust2")!=-1 then
REGEX_Replace([_CurrentField_], "Cust2", "Paul")
else [_CurrentField_] endif
The thing is, I can get away with this because I'm only replacing a small number of strings. But what if there were many strings to replace?
Is there a simpler way to accomplish this than 2 full paragraphs of REGEX code?
Solved! Go to Solution.
I would try something like Transpose to make it one field for all that I need to replace
Then use a find and replace tool to replace the entries
FInally use a Cross Tab tool to put back to original layout.
You might want to use a record ID to make the join back easier.
Just a quick idea but think would work
I'm not sure if making it one field would make too much of a difference... basically it's already one field because I'm using the multi-field tool, so it's always "current field". I was mostly wondering if there's a less verbose solution, either using a lookup table of some kind or if there's some way to word the formulas differently. This was my 4th or 5th attempt, because I soon found out that a simple Replace(String, Target, Replacement) didn't work.
What would be super cool is if some wizard out there made an anonymizer macro, where you could quickly group the fields you need to change and then it would add a column for replacement value (maybe even automatically generated like Cust001, Cust002 etc.), then create a way to replace one value with the other throughout. Hmm... sounds like a good side project for me to try.
Did a simple thing for masking names in data. If this is what you mean will do one for fields
This will create a unique ID for each name and replace the sepcified fields with Custxxx for it.
Can also be done on Field Names as well. Attached a second sample showing masking fieldnames. Masked ID are consistent on both data sets
Wow! I completely don't get what you did here, but it's impressive and fast!
It's 99.9% of what I'm looking for, so I'm marking it as correct and I encourage you to turn it into an anonymizer macro, because I think everyone would love you for that.
The only way in which it isn't perfect for me yet is that I already have one unique field, which is "customer - store#" and I'm not sure how to retain the store# part but keep the customer part (see last column in results). I'm going to send back the workflow with my pre-anonymized data. Maybe you can be the magic macro maker!
OK I think all I need to do with yours is re-create my unique store identifier by recombining the new customer name with the store number.
I'm going to try again and see what I get...
It works great for the data masking - can the name masking be modified to replace the customer name as part of a string rather than the whole field? Like "Alteryx Sales Area" and "Tableau Sales Area" would change to "Cust001 Sales Area" and "Cust 002 Sales Area"? If so, I dub thee Gandalf II.
Try the attached macro.
This can be used to mask columns selectively. It takes in a data set and masked table set. Any record not found get added to masked set and then the resulting data set and unioned masked data is returned.
Field name masking is another process but did it by taking output of Cust mask.
Let me know what you think.
Wowwwwwwwwwww!!!! Genius! I hearby dub thee Gandalf II, the Great Wizard of Alteryx!!
You should publish and advertise this, I think. It's magic!
PS it worked on my original raw file with no configuration necessary!