Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Data creation with Alteryx

pawel_toczynski
5 - Atom

I have a data creation problem that I'd like to solve with Alteryx.

 

Say we want create a dummy list of customers; 5000 of them. Then we want to allocate them across 5 locations as follows:

10% to Location1,

20% to Location2,

30% to Location3,

25% to Location4,

15% to Location5.

 

Then we want to assign them across 5 sales reps so that:

SalesRep1 gets 10%,

SalesRep2 gets 20%,

SalesRep3 gets 40%,

SalesRep2 gets 5%,

SalesRep2 gets 25% of the customers across each location.

 

Then we want to assign them a rating: 10% get A, 40% get AA, 50% get AAA. The rating percentages are per location and per sales rep… and then there are couple of other similar nested rules.

 

Does anyone know of an easy way to achieve that with Alteryx? I also want to endure I don't lose extreme values on the way, e.g. 5% of 5% of 5% so there needs to be some rounding up... and the model should always produce 5000 rows. 

 

Thank you!

8 REPLIES 8
JohnJPS
15 - Aurora

I think the "RandomSample" tool should be able to get the sample splits for you at each step (e.g. by grabbing a random N% of the records; or just a random N records, which may be easier than trying to recalculate new percentages).

jdunkerley79
ACE Emeritus
ACE Emeritus

A little bit of process to produce I think.

 

First generate the rows and assign a unique ID.

The first task is to convert to a set of customer id bins and assign a location. We can use a summarise tool to ensure each customer ID appears only once.

 

Then shuffle within each location to a random order and assign an Location  Customer ID for each customer in each location bin.

We repeat the process to assign each Location Customer ID to a Sales Rep.

 

Then repeat again to assign a rating.

 

Have attached a flow doing this.

pawel_toczynski
5 - Atom

This works great! Thank you!

JohnJPS
15 - Aurora

I've attached another workflow that uses R to allow flexibility in how you apply your breakdowns: just describe them however you like similar to the three categories already shown; and add them to the given mulit-join tool.  The given R command will then generate the sample data, adding it to the input data frame.

 

The R code first processes the "breakdown matrix" and determines how many rows (customerIDs) are needed for each possible "breakdown" ... then just loops through those combinations, sampling the appropriate number of rows (from rows not already sampled) and populating the test values.  (Perhaps @jdunkerley79 could turn this approach into Alteryx-only tools?  :-)

 

Enjoy!

 

jdunkerley79
ACE Emeritus
ACE Emeritus

Am game to try :) - must make some progress on my v0.3 Addins though first (HexBins tonight I hope!)

JohnJPS
15 - Aurora

That left me time to putz with it.  :-)  The attached workflow does it in similar fashion to what I described above.

 

Minor caveats to this one:

  • It's not quite as "randomized all over the place" as it just constructs the appropriate number of rows for each combination and then does a RecordID on the final construction.
  • It can be prone to having fewer-than-expected records in the last combination, due to rounding issues building up to it; (recommend using "Ceil" instead of "Round" and having a few too many rows in the end.
  • If you add new "breakdown" categories, you will have to modify other tools in the workflow, e.g to select them, include them in the formula, etc.

Otherwise it runs fast and doesn't use R.

 

jdunkerley79
ACE Emeritus
ACE Emeritus

Nice solution @JohnJPS

pawel_toczynski
5 - Atom

I like the R solution, thanks! We're hoping for a bit of randomnes, not an exact split ideally and that seems to the job. It seems fairly scalable too as we need to run the same process independently for a few [independent] sets of metrics and combine everything together. I'll give it a go now...

Labels