Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Unique Sample Selection based on multiple criteria

aiste_griffiths
5 - Atom

Hi all, 

 

I'm fairly new to Alteryx and would love to get some guidance how to handle one of the issues I have.

I am trying to build a workflow that does sample selection for me.

 

Conditions: 

1. Sample size is 4% (or minimum 8) based on Closed Correction Request number (raw data only shows closed cases).

2. If there are more Areas of Issue than above sample size, the sample size will be the number of areas of issue we have.

3. 50% of sampling needs to come from cases that have keywords.

3. Each agent needs to be sampled based on how many requests they closed, the more they closed the more samples are picked for them.

 

With my basic knowledge, I already built:

 

Condition 1 - workflow tells me what's the number I need to sample as per requirement (4% or 8)

Condition 2 - workflow tells me if I have to sample more than Condition 1 tells me.

I also already created keyword pool.

 

I know how many samples I need to pick (8)

I know that 50% of samples need to come form keyword pool (4)

I know how many samples I need to pick for each agent.

 

However, I can't figure out how to actually combine everything together and pick samples so that:

 

  • 50% of samples comes from cases that have keywords AND
  • each area of issue is sampled. In this case there should be 8 areas of issues sampled AND
  • Each agent is sampled based on how many cases they closed.

 

Is there a way to achieve this?

Thank you so much in advance in case I get an answer :)

Aiste

 

I am attaching raw data + my current workflow.

 

6 REPLIES 6
rzdodson
12 - Quasar

@aiste_griffiths definitely possible, just going to take some time to get all of the conditions ingested. I am getting the sense we are going to need to go down nested macro territory with this one (batch-to-iterative, or nested iterative). Will try to spin a solution soon for you.

Curious though, if we are ultimately sampling at the individual level to see how cases are resolved, why is the 100% representation of Areas important? Can you shed light on this one? Curious if there is an opportunity to decrease the complexity of macro and sampling logic.

 

In a different vein, I think as the data set gets smaller, the complexity of your sampling logic will have steps in it that'll make it next to impossible to have all Areas represented in the sample.

 

Edit: attaching Alteryx Analytical App to this post as a potential solution.

Solution.png

aiste_griffiths
5 - Atom

@rzdodson

First of all thanks so much for your quick response, this is my first time posting on Community Page - I wasn't sure what to expect! 😅

 

To answer your question about representation of areas - we want each Area of Issue sampled to make sure that team knows how to handle each Area if that makes sense? For that reason we are even willing to increase sample size (e.g. rules say 4% or min 8, but if there were 9 areas of issue in this case, sampling size becomes a 9 not an 8). I am regretting not adding extra area of issue in raw data to make it 9 for this example.

 

Just a thought, if it's going to be next to impossible to have all areas of issue represented, could I manually replace Areas of issue that were not selected for sampling at the final stage? Or am I thinking too manual here? LOL

rzdodson
12 - Quasar

@aiste_griffiths welcome to the Community! There are a bunch of brilliant minds who can definitely help. Tagging a few folks to see if they want to take a crack at this one: @binuacs@alexnajm@Qiu@caltang

 

As far as the count of the Areas, that will be pretty simple to have included. In the workflow that I attached, I have included an Area Count field as a part of the workflow's final output. If we know there are 9 areas, we should see a 9 in that field all the way through the column.

 

The issue that I think a model like this will run in to is a matter of sampling methodology potentially excluding areas because the sample size is too small. It may be just the dummy data itself. With a raw data set of 72 records and a potential of 4%/8 record sampling, I am not confident we would consistently get all areas accounted for. I am appending two pictures below of running the workflow to show what I am talking about. But, as the raw data set increases and/or the sampling percentage increase, the likelihood all areas are represented will increase. 

 

First Run of Workflow

Sampling.png

 

Second Run of Workflow

Sampling2.png

caltang
17 - Castor
17 - Castor

I’m actually having a hard time understanding fully your requirements - do you mind dumbing it down for me? 

Nested macros may be going a bit far but I like @rzdodson ‘s solution!

Calvin Tang
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
aiste_griffiths
5 - Atom

Welcome! @caltang

 

Apologies it's not too clear.

 

So:

1. Each month I need to select certain amount of samples from raw data (attached).

 

2. Each month that number changes, the requirement is it's 4% of total case number OR minimum 8 cases whichever is higher. So in this case it's 72 cases, 4% of 72 is 2.88 so we need to sample 8.

 

 

Sampling selection itself has conditions:

 

1. Each agent that appears in raw data needs to be sampled. We distribute the amount of samples we need for them based on how many cases they did (worked out perfectly in rzdodson workflow (e.g. Ann 3 cases, Ben 1 Case, Michael 1 Case, Stacey 1 Case, Tom 2 Cases = total 8 cases).

 

2. 50% of samples need to have a "keyword" (refer to “Case Summary” column, should be not null).

 

3. Issue seems to sit with "Area" Column. There is this condition that each "Area" need to be selected for sampling. I attached updated raw data, we can see that there are 10 areas, that means we need to actually sample select 10 cases to cover all “Areas”. So although we at first thought we need to select 8 cases, because there are 10 “areas”, sample size increased to 10.

"Area" would not necessarily always increase the sample size, it all depends on overall number of cases e.g. if we had overall 500 cases, 4% of that is 20 (all 10 areas would be covered within those 20 samples).

 

So all in all, I am trying to select 10 cases that cover all “Areas”, each agent is sampled based on how many cases they did and 50% of samples need to come from “Case Summary” column not null. Hope this clarifies? Thanks so much in advance for your insights! Aiste

aiste_griffiths
5 - Atom

Hi @rzdodson

 

Thanks a million for screenshots, my little Alteryx brain understands what you mean now! LOL

 

I reviewed previous months' records and it varies really! 68 in April, 72 in March, 161 in Feb, 86 in January...... I updated attached raw data to have more Areas 😂

Labels