Alteryx Designer Desktop Discussions

marinamaller · ‎10-21-2020

Hi,

I have a very large dataset (Millions of unique records) that I need to anonymize using random numbers to avoid a reverse engineering of the data. Each of my records already has a unique identifier already, and each new ID needs to to have 9 digits and be unique.

I checked other cases and all begin with sorting the dataset. I cannot do this because it would defeat the purpose. Any ideas on how I can get this done?

atcodedog05 · ‎10-21-2020

Hi @marinamaller

I have an idea of generating random number. But how to generate random unique number, I dont know.

I pretty much interested in the solution for this 🙂

BrandonB · ‎10-21-2020

Take a look at this approach: https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Masking-Data-for-Security/td-p/29834

This uses the MD5 Hash of the input string and can be truncated as desired.

TheOC · ‎10-21-2020

hi @marinamaller

The difficult part of this is the unique part.

My main thought process on this is an adaptation of this answer:
https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Unique-Random-numbers/m-p/307690/highl...

Keeping things unique is easiest done by keeping things sequential, so it makes sense to apply a sequential record id (starting from a 9 figure number) to each value, and then randomising order and joining on record position. This produces a unique, random value to each!

I really hope this is viable in your case, if not give me a shout and i'll keep trying

marinamaller · ‎10-21-2020

Brandon, Thanks so much! This solution is amazing. As a non-coder I had no idea what is this MD5 Unicode so I checked on it. I am now planning on using your solution to generate an anonymized key as a new field and then for additional safety, to sort the file by the anonymized key and assigned a Record ID. The Record Id will be easier for my client to manage and identify when I need to do any research in the original data. Thanks!!!!

marinamaller · ‎10-21-2020

Thanks! I will try your solution if the other one doesn't work. I like the simplicity of your solution.

Marina

Alteryx Designer Desktop Discussions

Use random numbers to anonymize data