Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Use random numbers to anonymize data

marinamaller
6 - Meteoroid

Hi,

I have a very large dataset (Millions of unique records) that I need to anonymize using random numbers to avoid a reverse engineering of the data. Each of my records already has a unique identifier already, and each new ID needs to to have 9 digits and be unique. 

I checked other cases and all begin with sorting the dataset. I cannot do this because it would defeat the purpose. Any ideas on how I can get this done?

5 REPLIES 5
atcodedog05
22 - Nova
22 - Nova

Hi @marinamaller 

 

I have an idea of generating random number. But how to generate random unique number, I dont know.

 

I pretty much interested in the solution for this 🙂

BrandonB
Alteryx
Alteryx

Take a look at this approach: https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Masking-Data-for-Security/td-p/29834

 

This uses the MD5 Hash of the input string and can be truncated as desired. 

TheOC
15 - Aurora
15 - Aurora

hi @marinamaller 

The difficult part of this is the unique part.

My main thought process on this is an adaptation of this answer:
https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Unique-Random-numbers/m-p/307690/highl...

TheOC_1-1603303849006.png

 



Keeping things unique is easiest done by keeping things sequential, so it makes sense to apply a sequential record id (starting from a 9 figure number) to each value, and then randomising order and joining on record position. This produces a unique, random value to each!

I really hope this is viable in your case, if not give me a shout and i'll keep trying


Bulien
marinamaller
6 - Meteoroid

Brandon, Thanks so much! This solution is amazing. As a non-coder I had no idea what is this MD5 Unicode so I checked on it. I am now planning on using your solution to generate an anonymized key as a new field and then for additional safety, to sort the file by the anonymized key and assigned a Record ID. The Record Id will be easier for my client to manage and identify when I need to do any research in the original data. Thanks!!!!

marinamaller
6 - Meteoroid

Thanks! I will try your solution if the other one doesn't work. I like the simplicity of your solution.

Marina 

Labels