Random Sampling
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I have a dataset of all orders I placed in 2024. Within this data, the notable columns are the date of the order, the order number, and the type of order (1,2, or 3). My goal is to randomly select 5 orders from each month, ideally of different order types. It can become tricky as there can be duplicate order numbers, since within each order number there are item #s. For example: Order number 123 has a ball and a glove in the shipment, therefor there are two line items for that order, making the order number appear twice.
I am struggling to find an efficient way to automate this previously overwhelmingly manual task in alteryx.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hello @pbresney
Thank you for your post, I have tried to solve your questions and here is my approach.
Firstly, I have attempted to create some dummy data. In my dummy data there is a list of unique order ID's with each order ID having an order "Month" and a "Type" associated to it.
From here, I have broken the problem down into two stages. The first stage I have tackled is taking one order at random per month per order type. I have done this by assighning a random value to every Order ID and then using the sort and sampe tool to take the Order ID with the lowest random value, per Month, per order Type. This enforces your requirement of atleast one of each order type per month, in the random samping.
As there are 3 distinct order types (1,2 and 3), and I have already selected one order from each type, I still need to get two more random orders for each month. To do this I have followed a similar process with the random number, then sort / sample tool; the main difference being I nolonger group by order Type, and select the 2 lowest random numbers per month, instead of just the one.
Now a total of 5 differnt Order IDs have been selected per month so the union tool has been used to combine the two seperate stages from above.
 
I apprechiate that I have used a list where each Order ID only has one record associated with it. If you have multiple records / products per Order ID, then I would suggest using a summarise tool to get a unique list of Order ID's, then following the process I have outlined above. Once you have done this you can join back to your source data, using Order ID, to bring in all the data associated with each order.
I hope this helps, please let me know if you have any questions.
Regards - Pilsner
