Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.

Distributing records to groups so that the groups are as similar as possible


I've run into a bit of a problem that I'll try to explain with a hypothetical scenario.


I have 100 records that I would like to place into 20 groups of 5 based on 3 factors.  The factors are gender (M/F), program (5 programs) and activity level (3 levels).  I’d like each of the 20 groups to be as similar as possible across those factors.  I guess I’m not really grouping the factors but distributing them as evenly as possible across the different groups. 


If all of those factors had equal numbers (50 males/50 females, 20 from each program and ~33 at each activity level) each group would have one person from each of the 5 programs,  2 or 3 males and 1 or 2 from each activity level.  In reality the data won’t be evenly distributed though.   


Does anyone have any suggestions as to what tools I could use to assign people to groups to get as even a spread as possible across the three factors?


Thanks in advance.





I think the Tile tool might be useful here, but you’ll have to play around with it. I’ll have go when I get to my pc and let you know.

Thanks for the tip David I'll do some tests with that.


Do you have some sample data to send me then I can play with it


Hi David,


This is a sample of 360 students.  Would it be possible to create 60 groups of 6 students with each group being as similar as possible based on Gender, Program and Activity Level.  There isn't an even distribution in any of those categories which complicates things a little e.g. there are only 37 female users so 23 groups will be all male if the female users were distributed evenly.


It might be necessary to stratify the factors to decide which is more important e.g. will I place a student in one group because it satisfies a need based on gender or another group to satisfy a need for a student with a high activity level.  If that's the case lets put Gender as most important, then Program and lastly Activity Level.




Ok, here are my thoughts so far.


Sort the data by Gender Code, Program Code and Activity Level. This means that sequential students will be the most similar based on these criteria.


Then create 6 tiles each with 60 records. Each record in a tile has a sequence number, so there will be 6 number 1's, 6 number 2's, 6 number 3's, etc up to 6 number 60's.


Group all the number 1's together in a group, number 2's in a group, etc. 


This should give you 60 groups as similar as the sorting allows.


You could go a little more sophisticated if you want give certain criteria more weighting so that you end up with a ranking score for each student, then sort by ranking and do the same as above.


Let me know if this works for you.


equal records grouping.png


That is awesome, I'll definitely use that method.  I also like your point about weighting which is something that I hadn't really considered but will have a go at if needed.  Thanks so much for taking the time to help me on this.  I'd been getting a bit bogged down and you've really moved things along for me.