Hi all,
I am looking to automate the standardizing and cleansing process of removing client specific data in data sets. Currently I am thinking the best way to do this is to assign each column a category and replace all columns with a value in a pre-made database for that category.
For example, I would have a Template (see attached), where the user would input column headers and select the type of data that column represents. We would then use Alteryx to randomize the data based off the database values. For example, a Company category would replace previous values with random Companies (eg Company 1, 2 or 3)
However, I am having trouble actually implementing this. Does anyone have any ideas?
Solved! Go to Solution.
@jason5333
Let me clarify this is what you want.
Some columns will be specified, one or more, such as
Client |
Document Number |
Fiscal Year |
Item |
Purchasing Doc. |
then the data in these columns will be randomized with some preset values?
Hi Qiu,
Thank you for your response! Yep - that's exactly what I mean. The data would be randomized based off the "Category", so for example I would have a preset database:
City |
Random City 1 |
Random City 2 |
etc |
and if that Column is marked as City category, all values in that column would be "Random City 1" or "Random City 2" etc.
I can't seem to get it past the transposing phase, where I have classified each Column into the category. If you have any ideas I would love to hear them! Thank you for your help.
Hi Qiu,
Thank you so much! This is a great solution - amazing idea. Will take a look through but this performs the exact function I needed help with! Thank you so so much!
Jason
@jason5333
Glad to be a bit help and thank you for the accept mark.
Since it is very interesting one, can I submit as a weekly challenge idea?
Hi Qiu, definitely! So appreciative of your help! Think will be a great weekly challenge idea.
Thank you @Qiu. Thanks to your solution I have now learned about using RandInt. I like your use of RandInt([Max_Tile_SequenceNum]-1)+1 to avoid ending up with 0 as a value of Ran
I'm definitely adding that to my toolkit 🙂