Hello!
I have a set of data with a "program" column that could contain multiple (and a variable number) of groups.
Record | Name | Program |
1 | John | A |
2 | Mary | A |
3 | Sue | A |
4 | Jenny | B |
5 | Bob | A |
6 | Clark | C |
7 | Jimmy | C |
8 | Preston | B |
9 | Raymond | B |
10 | Doe | A |
11 | Peter | C |
12 | Calvin | B |
13 | Tommy | A |
I also have a predetermined, and variable number of records that I need from each group, randomly selected. The quantity to select for each group is stored in an xlsx/csv file.
Program | Qty of Records to Randomly Select |
A | 1 |
B | 3 |
C | 1 |
I need to have a macro that will accept the CSV with the quantities as an input, as well as the actual database records, and then output a set of records randomly selected in the quantities defined. The output should look something like this (sorting is irrelevant so long as its random).
Record ID | Name | Program |
10 | Doe | A |
4 | Jenny | B |
8 | Preston | B |
9 | Raymond | B |
6 | Clark | C |
I don't even know where to begin, I think I need an iterative macro, but I don't know how to align the inputs or have two streams going on. Any help would be greatly appreciated!
Solved! Go to Solution.
Here is an ordinary macro that will do what you need. The macro assigns random numbers to all reords, then sorts the records into groups ordered by random number. The relevant sample size is then added to each record. A formula determines if each record is in or out of the sample based on comparing the sequence number of the record to the sample size. This yields sampling without replacement, which appears to be the requirement. For the same input the workflow should deliver a different sample on each iteration.
This is brilliant - thank you!!