Alteryx Designer Desktop Discussions

pawel_toczynski · ‎03-14-2016

I have a data creation problem that I'd like to solve with Alteryx.

Say we want create a dummy list of customers; 5000 of them. Then we want to allocate them across 5 locations as follows:

10% to Location1,

20% to Location2,

30% to Location3,

25% to Location4,

15% to Location5.

Then we want to assign them across 5 sales reps so that:

SalesRep1 gets 10%,

SalesRep2 gets 20%,

SalesRep3 gets 40%,

SalesRep2 gets 5%,

SalesRep2 gets 25% of the customers across each location.

Then we want to assign them a rating: 10% get A, 40% get AA, 50% get AAA. The rating percentages are per location and per sales rep… and then there are couple of other similar nested rules.

Does anyone know of an easy way to achieve that with Alteryx? I also want to endure I don't lose extreme values on the way, e.g. 5% of 5% of 5% so there needs to be some rounding up... and the model should always produce 5000 rows.

Thank you!

JohnJPS · ‎03-14-2016

I think the "RandomSample" tool should be able to get the sample splits for you at each step (e.g. by grabbing a random N% of the records; or just a random N records, which may be easier than trying to recalculate new percentages).

jdunkerley79 · ‎03-14-2016

A little bit of process to produce I think.

First generate the rows and assign a unique ID.

The first task is to convert to a set of customer id bins and assign a location. We can use a summarise tool to ensure each customer ID appears only once.

Then shuffle within each location to a random order and assign an Location Customer ID for each customer in each location bin.

We repeat the process to assign each Location Customer ID to a Sales Rep.

Then repeat again to assign a rating.

Have attached a flow doing this.

pawel_toczynski · ‎03-15-2016

This works great! Thank you!

JohnJPS · ‎03-15-2016

I've attached another workflow that uses R to allow flexibility in how you apply your breakdowns: just describe them however you like similar to the three categories already shown; and add them to the given mulit-join tool. The given R command will then generate the sample data, adding it to the input data frame.

The R code first processes the "breakdown matrix" and determines how many rows (customerIDs) are needed for each possible "breakdown" ... then just loops through those combinations, sampling the appropriate number of rows (from rows not already sampled) and populating the test values. (Perhaps @jdunkerley79 could turn this approach into Alteryx-only tools? :-)

Enjoy!

jdunkerley79 · ‎03-15-2016

Am game to try :) - must make some progress on my v0.3 Addins though first (HexBins tonight I hope!)

JohnJPS · ‎03-15-2016

That left me time to putz with it. :-) The attached workflow does it in similar fashion to what I described above.

Minor caveats to this one:

It's not quite as "randomized all over the place" as it just constructs the appropriate number of rows for each combination and then does a RecordID on the final construction.
It can be prone to having fewer-than-expected records in the last combination, due to rounding issues building up to it; (recommend using "Ceil" instead of "Round" and having a few too many rows in the end.
If you add new "breakdown" categories, you will have to modify other tools in the workflow, e.g to select them, include them in the formula, etc.

Otherwise it runs fast and doesn't use R.

jdunkerley79 · ‎03-15-2016

Nice solution @JohnJPS

pawel_toczynski · ‎03-16-2016

I like the R solution, thanks! We're hoping for a bit of randomnes, not an exact split ideally and that seems to the job. It seems fairly scalable too as we need to run the same process independently for a few [independent] sets of metrics and combine everything together. I'll give it a go now...

Alteryx Designer Desktop Discussions

Data creation with Alteryx

Re: Is there any way the computer vision tools can...

Re: Batch Macro

Re: How to get cell reference address from excel

Re: Replacing Forecast columns with Actual Data

Re: Row creation