Random Sampling
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hello,
I am a beginner level user.
I have a data set of nearly 20k rows of transactions that are a part of 37 funds. The dataset has each row as a transaction with the corresponding fund listed. I am trying to obtain 5 random samples per fund for a total of 185. I am having trouble figuring out how to sample based on these parameters.
Can someone please assist?
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Assuming your data isn't sorted and there are an equal or close to equal number of transactions per fund, you could use a Sample tool with the following parameters.
1. Radio Button: 1 of every N rows
2. N = 100
3. Group By: Fund (The column name for the fund)
This would select every 100th transaction for each fund and get you at least 5 results if each fund has the same number of transactions. 20,000 / 37 = 540, so you could select 100, 200, 300, 400 and the 500th transaction.
If this is not the case, you could run the 20,000 inputs through an iterative macro and have each fund go through a random sample using the Random % Sample tool which can select the 5 records per fund you need. Let me know if you'd like the iterative macro created and I can help, providing sample data would be useful.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Example is attached. Steps below:
1. Use the formula tool RAND() to create a new column
2. Use the sort by tool to Sort by RAND()
3. Use the sample tool to select the first N rows. Set N = 5. Group by your fund identifier.
4. Use the select tool to drop the random number that you generated in step 1
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@Bkatz99 @griffinwelsh
There is a feature "Group by" in the sample tool.
So you can group by your "Fund" column and the N = 5 then you should be good.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thank you so much! This worked perfectly.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@Qiu without the other steps in my solution wouldn't this return the same 5 records for each group every time and therefore not be a random sample?
