Sample based on two columns

TL;DR: I need a 5% sample of a population for each distinct combination of 2 sets of categories.

Hi everyone,

Let's say I have a data set where I collect the rating responses for different desserts in different cities (where I have my dessert superstores!). Now I want to contact a small sample of customers again for a longer survey to figure out how to refine my desserts and be more successful.

I have a range of 6 desserts that I sell in 3 different cities, with thousands of ratings per dessert&city combination.

Not all desserts have the same amount of ratings; however, I want to sample 5% of each dessert&city combination. So, if I have "Los Angeles&Apple Pie" 1000 times, I want to get a random 50 ratings; if I have "Chicago&Cherry Blossom Mochi" only 200 times, I want to get 10 random ratings. 🙂

Example data showing all 6 dessert types:

Location	Dessert	Rating
Los Angeles	Apple Pie	7.2
Los Angeles	Marzipan Bar	3.3
Los Angeles	Chocolate Pudding	7.4
Los Angeles	Macarons	9.8
Los Angeles	Pistachio Ice Cream	6.1
Los Angeles	Cherry Blossom Mochi	9.9
New York City	Apple Pie	5.7
New York City	Marzipan Bar	8.3
New York City	Chocolate Pudding	9.8
New York City	Macarons	9.1
New York City	Pistachio Ice Cream	7.5
New York City	Cherry Blossom Mochi	5.0
Chicago	Apple Pie	6.6
Chicago	Marzipan Bar	8.0
Chicago	Chocolate Pudding	6.8
Chicago	Macarons	3.4
Chicago	Pistachio Ice Cream	7.1
Chicago	Cherry Blossom Mochi	8.7