I have a data creation problem that I'd like to solve with Alteryx.
Say we want create a dummy list of customers; 5000 of them. Then we want to allocate them across 5 locations as follows:
10% to Location1,
20% to Location2,
30% to Location3,
25% to Location4,
15% to Location5.
Then we want to assign them across 5 sales reps so that:
SalesRep1 gets 10%,
SalesRep2 gets 20%,
SalesRep3 gets 40%,
SalesRep2 gets 5%,
SalesRep2 gets 25% of the customers across each location.
Then we want to assign them a rating: 10% get A, 40% get AA, 50% get AAA. The rating percentages are per location and per sales rep… and then there are couple of other similar nested rules.
Does anyone know of an easy way to achieve that with Alteryx? I also want to endure I don't lose extreme values on the way, e.g. 5% of 5% of 5% so there needs to be some rounding up... and the model should always produce 5000 rows.
Thank you!
Solved! Go to Solution.
I think the "RandomSample" tool should be able to get the sample splits for you at each step (e.g. by grabbing a random N% of the records; or just a random N records, which may be easier than trying to recalculate new percentages).
A little bit of process to produce I think.
First generate the rows and assign a unique ID.
The first task is to convert to a set of customer id bins and assign a location. We can use a summarise tool to ensure each customer ID appears only once.
Then shuffle within each location to a random order and assign an Location Customer ID for each customer in each location bin.
We repeat the process to assign each Location Customer ID to a Sales Rep.
Then repeat again to assign a rating.
Have attached a flow doing this.
This works great! Thank you!
I've attached another workflow that uses R to allow flexibility in how you apply your breakdowns: just describe them however you like similar to the three categories already shown; and add them to the given mulit-join tool. The given R command will then generate the sample data, adding it to the input data frame.
The R code first processes the "breakdown matrix" and determines how many rows (customerIDs) are needed for each possible "breakdown" ... then just loops through those combinations, sampling the appropriate number of rows (from rows not already sampled) and populating the test values. (Perhaps @jdunkerley79 could turn this approach into Alteryx-only tools? :-)
Enjoy!
Am game to try :) - must make some progress on my v0.3 Addins though first (HexBins tonight I hope!)
That left me time to putz with it. :-) The attached workflow does it in similar fashion to what I described above.
Minor caveats to this one:
Otherwise it runs fast and doesn't use R.
Nice solution @JohnJPS
I like the R solution, thanks! We're hoping for a bit of randomnes, not an exact split ideally and that seems to the job. It seems fairly scalable too as we need to run the same process independently for a few [independent] sets of metrics and combine everything together. I'll give it a go now...