This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
General Discussions has some can't miss conversations going on right now! From conversations about automation to sharing your favorite Alteryx memes, there's something for everyone. Make it part of your community routine!
Hi,
I have a very large dataset (Millions of unique records) that I need to anonymize using random numbers to avoid a reverse engineering of the data. Each of my records already has a unique identifier already, and each new ID needs to to have 9 digits and be unique.
I checked other cases and all begin with sorting the dataset. I cannot do this because it would defeat the purpose. Any ideas on how I can get this done?
Solved! Go to Solution.
I have an idea of generating random number. But how to generate random unique number, I dont know.
I pretty much interested in the solution for this 🙂
Take a look at this approach: https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Masking-Data-for-Security/td-p/29834
This uses the MD5 Hash of the input string and can be truncated as desired.
hi @marinamaller
The difficult part of this is the unique part.
My main thought process on this is an adaptation of this answer:
https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Unique-Random-numbers/m-p/307690/highl...
Keeping things unique is easiest done by keeping things sequential, so it makes sense to apply a sequential record id (starting from a 9 figure number) to each value, and then randomising order and joining on record position. This produces a unique, random value to each!
I really hope this is viable in your case, if not give me a shout and i'll keep trying
Brandon, Thanks so much! This solution is amazing. As a non-coder I had no idea what is this MD5 Unicode so I checked on it. I am now planning on using your solution to generate an anonymized key as a new field and then for additional safety, to sort the file by the anonymized key and assigned a Record ID. The Record Id will be easier for my client to manage and identify when I need to do any research in the original data. Thanks!!!!
Thanks! I will try your solution if the other one doesn't work. I like the simplicity of your solution.
Marina