Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Random Sampling

Bkatz99
5 - Atom

Hello,

 

I am a beginner level user.

 

I have a data set of nearly 20k rows of transactions that are a part of 37 funds. The dataset has each row as a transaction with the corresponding fund listed. I am trying to obtain 5 random samples per fund for a total of 185. I am having trouble figuring out how to sample based on these parameters. 

 

Can someone please assist?

 

 

5 REPLIES 5
Hsandness
8 - Asteroid

Assuming your data isn't sorted and there are an equal or close to equal number of transactions per fund, you could use a Sample tool with the following parameters.

 

1. Radio Button: 1 of every N rows

2. N = 100

3. Group By: Fund (The column name for the fund)

 

This would select every 100th transaction for each fund and get you at least 5 results if each fund has the same number of transactions. 20,000 / 37 = 540, so you could select 100, 200, 300, 400 and the 500th transaction.

 

 

If this is not the case, you could run the 20,000 inputs through an iterative macro and have each fund go through a random sample using the Random % Sample tool which can select the 5 records per fund you need. Let me know if you'd like the iterative macro created and I can help, providing sample data would be useful.

griffinwelsh
12 - Quasar

Example is attached. Steps below:

 

1. Use the formula tool RAND() to create a new column

2. Use the sort by tool to Sort by RAND()

3. Use the sample tool to select the first N rows. Set N = 5. Group by your fund identifier.

4. Use the select tool to drop the random number that you generated in step 1

Qiu
21 - Polaris
21 - Polaris

@Bkatz99  @griffinwelsh 
There is a feature "Group by" in the sample tool.

So you can group by your "Fund" column and the N = 5 then you should be good.

024-07-17 084338.png

Bkatz99
5 - Atom

Thank you so much! This worked perfectly.

griffinwelsh
12 - Quasar

@Qiu without the other steps in my solution wouldn't this return the same 5 records for each group every time and therefore not be a random sample?

Labels
Top Solution Authors