TL;DR: I need a 5% sample of a population for each distinct combination of 2 sets of categories.
Hi everyone,
Let's say I have a data set where I collect the rating responses for different desserts in different cities (where I have my dessert superstores!). Now I want to contact a small sample of customers again for a longer survey to figure out how to refine my desserts and be more successful.
I have a range of 6 desserts that I sell in 3 different cities, with thousands of ratings per dessert&city combination.
Not all desserts have the same amount of ratings; however, I want to sample 5% of each dessert&city combination. So, if I have "Los Angeles&Apple Pie" 1000 times, I want to get a random 50 ratings; if I have "Chicago&Cherry Blossom Mochi" only 200 times, I want to get 10 random ratings. 🙂
Example data showing all 6 dessert types:
Location | Dessert | Rating |
Los Angeles | Apple Pie | 7.2 |
Los Angeles | Marzipan Bar | 3.3 |
Los Angeles | Chocolate Pudding | 7.4 |
Los Angeles | Macarons | 9.8 |
Los Angeles | Pistachio Ice Cream | 6.1 |
Los Angeles | Cherry Blossom Mochi | 9.9 |
New York City | Apple Pie | 5.7 |
New York City | Marzipan Bar | 8.3 |
New York City | Chocolate Pudding | 9.8 |
New York City | Macarons | 9.1 |
New York City | Pistachio Ice Cream | 7.5 |
New York City | Cherry Blossom Mochi | 5.0 |
Chicago | Apple Pie | 6.6 |
Chicago | Marzipan Bar | 8.0 |
Chicago | Chocolate Pudding | 6.8 |
Chicago | Macarons | 3.4 |
Chicago | Pistachio Ice Cream | 7.1 |
Chicago | Cherry Blossom Mochi | 8.7 |
I'm grateful for any suggestions!
Longer example data attached. 😃
Solved! Go to Solution.
Hi @Inactive User ,
Would something like this work for you?
The dataset is not large enough to see the sample, but it should work in a larger dataset
Hi @Inactive User
Here is a workflow for the task.
Output:
Workflow:
I can creating a RandomID and sorting by it to random zine the data.
Then use sample tool n%
Currently 5% is too small with the number of rows. With more rows it will be more effective🙂
Hope this helps 🙂
If this post helps you please mark it as solution. And give a like if you dont mind 😀👍
@AngelosPachis Oh I'm so embarrassed, I can't believe it was this easy 🙈 Clearly I need to take a break from working on this -- thank you so much!
Hi @Inactive User
As per your requirement last line said you wanted random sample right
Just using a sample tool will not pick random sample. It only pick first n% rows.
My workflow randomizes the selection with help of a randomID
And 1 in every n rows is not same as N%
Hi @Inactive User ,
Don't be embarrassed since as it turned out I wasn't totally correct. As @atcodedog05 passionately pointed out, indeed my previous answer won't give you a perfectly random sample.
Instead, you can either follow the workflow he proposed, but sample for the first 5% of rows to get the correct answer. Otherwise, you can use a batch macro with a random sample tool, which you can also find attached
Hope that solves it