Hi Folks, need your expert advise on the below scenario.
I am trying to get random samples from a very large data set(over 60k records), the conditions are as below
1. The data set has user names and case IDs which they have worked on. Some users have worked on 30k records and few in less numbers (5, 10 etc)
2. I need to pick samples based on the percentage to the total number of cases they have worked on. Eg. users with large cases needs more sampling and users with less cases fewer samples to be picked.
3. Each user should contain atleast 1 sample
4 Total number of samples should be equal to 50
Breaking my head over this from few days, any help is appreciated.
Thank you in advance!