Alert: There is a planned Community maintenance outage October 16th from approximately 10 - 11 PM PST. During this time the Alteryx Community will be inaccessible. Thank you for your understanding!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Splitting data into different but equal random samples

Blessing
5 - Atom

I have 3 data sets (eg Scores from three different Universities) all mixed up in a single excel sheet and with different number of scores. Eg Oxford (AT1) sample with 4000 scores, Cambridge (AT2) with 3200 scores, and Portsmouth (AT3) with 1800 scores and I want to trim them so that the sample sizes match the university with the lowest number of scores (in this case 1800), what would the macros look like? Please note the spreadsheet has 2 columns i.e. University and Score. Please see picture below

 

Blessing_0-1603463801438.png

 

4 REPLIES 4
Maskell_Rascal
13 - Pulsar

Hi @Blessing 

 

If this is more of a one off thing, you can use the Sample tool and group by the University column to collect the number of rows needed. 

 

This is my starting data:

Maskell_Rascal_0-1603463811808.png

 

So since I only have 7 rows of scores for Portsmouth, I set my the "N = " to 7 and select the First N rows option. 

 

Maskell_Rascal_1-1603463884692.png

 

You would want to set your sample tool to 1800 to trim it down as desired. 

 

If this solves your issue please mark the answer as correct, if not let me know!

 

Thanks!

Phil

Blessing
5 - Atom

Thanks Phil, does this randomly extract scores to give a random sample for each University?

Maskell_Rascal
13 - Pulsar

Hey @Blessing 

 

Unfortunately it is not random, but in order of the rows of the data. You can get a randomized sampling, but you'd need to spit your data by University to do it. The Random % tool lets you set the number of random records you want by percentage or fixed number, so this would be your best option. 

Maskell_Rascal_0-1603465032874.png

 

I've attached a sample workflow for you to try out. 

 

If this solves your issue please mark the answer as correct, if not let me know!

 

Thanks!

Phil

Maskell_Rascal
13 - Pulsar

@Blessing One other option I just thought of would be to use a formula tool to assign a random value, sort by University and random value, and then use the Sample tool to select your desired number of rows for each group. 

 

I've attached that example for you as well. 

 

Again, let me know if this solves your issue or not. 

 

Thanks!

Phil

Labels