Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Need Help with Random % Sample Function

mwolffe
6 - Meteoroid

Hello Everyone!

 

I am writing a Workflow that will need to pull from a very large sample of data that exceeds 40k lines.  What I am hoping to accomplish, is to pull a random 4 lines of data from each Employee and then output all of the random selections to a single excel file.

 

I was able to get this to work if if put in a bunch of the Filter functions together and specify a separate Employee for each.  The problem is I have over 600 Employees to pull data for and they could change from month to month.  So having to go back and add/delete Filters would be entirely to daunting.

 

I have to think there would be a easier way of accomplishing this task, I just can't seem to figure it out.  In the past I was able to use the Dynamic Input when I was searching a SQL query, but this is Excel and I cannot come up with a solution.  So I am hoping someone on here might have some suggestions to try. 🙂

 

I attached my test workflow and sample data.

 

mwolffe_0-1596227660039.png

 

2 REPLIES 2
ggruccio
ACE Emeritus
ACE Emeritus

Hi @mwolffe 

 

A very quick solution for you to try.  You could use the sample tool to generate a random sample grouped by Employee then take the first 4 records for every employee in another sample tool (knowing that the records were already pulled at random...you are taking a sample of the sample).

 

Your N in the first sample tool will have to be something that will at least generate 4 rows per employee.

 

ggruccio_0-1596228915651.png

ggruccio_1-1596229016326.png

I just thought of another way:

 

It would involve using a batch macro or iterative macro (probably could accomplish with either) to run a random sample for each employee then keep iterating until you run out of employees...but it may involve a bit more prep work on your end...understanding how macros work etc.

 

 

 

 

CharlieS
17 - Castor
17 - Castor

Hi @mwolffe 

 

There's definitely a better way to do this. I would use the Rand() expression assign a random number to each records (each time the report is run), Sort by that random value, then use the Group By function of the Sample tool to select the first 4 records of each employee value.It will be a random sample of 4 each time the report is run, and will scale to any number of unique employee ids or records of data you have.

 

I've attached an example of this in action. Let me know if you have any questions. 

Labels