Alteryx Designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.

General Discussions has some can't miss conversations going on right now! From conversations about automation to sharing your favorite Alteryx memes, there's something for everyone. Make it part of your community routine!

SOLVED

Use random numbers to anonymize data

marinamaller
6 - Meteoroid

Hi,

I have a very large dataset (Millions of unique records) that I need to anonymize using random numbers to avoid a reverse engineering of the data. Each of my records already has a unique identifier already, and each new ID needs to to have 9 digits and be unique. 

I checked other cases and all begin with sorting the dataset. I cannot do this because it would defeat the purpose. Any ideas on how I can get this done?

atcodedog05
17 - Castor

Hi @marinamaller 

 

I have an idea of generating random number. But how to generate random unique number, I dont know.

 

I pretty much interested in the solution for this 🙂

BrandonB
Alteryx
Alteryx

Take a look at this approach: https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Masking-Data-for-Security/td-p/29834

 

This uses the MD5 Hash of the input string and can be truncated as desired. 

TheOC
11 - Bolide

hi @marinamaller 

The difficult part of this is the unique part.

My main thought process on this is an adaptation of this answer:
https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Unique-Random-numbers/m-p/307690/highl...

TheOC_1-1603303849006.png

 



Keeping things unique is easiest done by keeping things sequential, so it makes sense to apply a sequential record id (starting from a 9 figure number) to each value, and then randomising order and joining on record position. This produces a unique, random value to each!

I really hope this is viable in your case, if not give me a shout and i'll keep trying

marinamaller
6 - Meteoroid

Brandon, Thanks so much! This solution is amazing. As a non-coder I had no idea what is this MD5 Unicode so I checked on it. I am now planning on using your solution to generate an anonymized key as a new field and then for additional safety, to sort the file by the anonymized key and assigned a Record ID. The Record Id will be easier for my client to manage and identify when I need to do any research in the original data. Thanks!!!!

marinamaller
6 - Meteoroid

Thanks! I will try your solution if the other one doesn't work. I like the simplicity of your solution.

Marina 

Labels