TL;DR: I need a 5% sample of a population for each distinct combination of 2 sets of categories.
Hi everyone,
Let's say I have a data set where I collect the rating responses for different desserts in different cities (where I have my dessert superstores!). Now I want to contact a small sample of customers again for a longer survey to figure out how to refine my desserts and be more successful.
I have a range of 6 desserts that I sell in 3 different cities, with thousands of ratings per dessert&city combination.
Not all desserts have the same amount of ratings; however, I want to sample 5% of each dessert&city combination. So, if I have "Los Angeles&Apple Pie" 1000 times, I want to get a random 50 ratings; if I have "Chicago&Cherry Blossom Mochi" only 200 times, I want to get 10 random ratings. 🙂
Example data showing all 6 dessert types:
| Location | Dessert | Rating |
| Los Angeles | Apple Pie | 7.2 |
| Los Angeles | Marzipan Bar | 3.3 |
| Los Angeles | Chocolate Pudding | 7.4 |
| Los Angeles | Macarons | 9.8 |
| Los Angeles | Pistachio Ice Cream | 6.1 |
| Los Angeles | Cherry Blossom Mochi | 9.9 |
| New York City | Apple Pie | 5.7 |
| New York City | Marzipan Bar | 8.3 |
| New York City | Chocolate Pudding | 9.8 |
| New York City | Macarons | 9.1 |
| New York City | Pistachio Ice Cream | 7.5 |
| New York City | Cherry Blossom Mochi | 5.0 |
| Chicago | Apple Pie | 6.6 |
| Chicago | Marzipan Bar | 8.0 |
| Chicago | Chocolate Pudding | 6.8 |
| Chicago | Macarons | 3.4 |
| Chicago | Pistachio Ice Cream | 7.1 |
| Chicago | Cherry Blossom Mochi | 8.7 |
I'm grateful for any suggestions!
Longer example data attached. 😃