Hello, so i'm stuck trying to find a solution for my problem and i would apreciate any help.
Let's suppose I have a column for State where the same State can repeat several times.
I wanna take a random sample of my data but the thing is whathever this sample is it has to be the same proportion of my original data.
In my original data New York is 50% of the records, Maine is 17% and Massachusetts is 33%, in my sample they have to be in the same proportion.
Let's say i want a sample with 6 registers, the output could be this:
Or this:
(Keep in mind my original data set has 5000 "States" not only three so i would like a generic solution.)
I will be very grateful to anyone who can help me.
could use the sample tool, group by the state column, and choose a random sample from each state
Hi, thank you for the answer, but in my original database i have 5000 states so no way i could do it for every single state it would take too much time
Similar approach to @Matthew: Creat a field RAND() → Sort by new field → Take % of data with a grouped by State in Sample Tool,
@arods .... the tool automatically does the grouping.