Hi,
I have a very large dataset (Millions of unique records) that I need to anonymize using random numbers to avoid a reverse engineering of the data. Each of my records already has a unique identifier already, and each new ID needs to to have 9 digits and be unique.
I checked other cases and all begin with sorting the dataset. I cannot do this because it would defeat the purpose. Any ideas on how I can get this done?
Solved! Go to Solution.
I have an idea of generating random number. But how to generate random unique number, I dont know.
I pretty much interested in the solution for this 🙂
Take a look at this approach: https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Masking-Data-for-Security/td-p/29834
This uses the MD5 Hash of the input string and can be truncated as desired.
hi @marinamaller
The difficult part of this is the unique part.
My main thought process on this is an adaptation of this answer:
https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Unique-Random-numbers/m-p/307690/highl...
Keeping things unique is easiest done by keeping things sequential, so it makes sense to apply a sequential record id (starting from a 9 figure number) to each value, and then randomising order and joining on record position. This produces a unique, random value to each!
I really hope this is viable in your case, if not give me a shout and i'll keep trying
Brandon, Thanks so much! This solution is amazing. As a non-coder I had no idea what is this MD5 Unicode so I checked on it. I am now planning on using your solution to generate an anonymized key as a new field and then for additional safety, to sort the file by the anonymized key and assigned a Record ID. The Record Id will be easier for my client to manage and identify when I need to do any research in the original data. Thanks!!!!
Thanks! I will try your solution if the other one doesn't work. I like the simplicity of your solution.
Marina