Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.

Which dimension (category) combinations are driving a metric?


Hi All,


First post. Here goes. 


So I have a number of different dimensions (or categories excuse my tableau speak) and I want to understand which combinations of these dimensions are important in driving a target metric. My sense is that my problem is with framing the question, so let me use an example and we might be able to figure out the detail:


Lets say I am the support department of a company:


  • I have a list of customers calling in and the number of support cases they raise in addition to some information about the customers:
    • What products they have purchased
    • How old they are
    • Where are they from
    • How eductaed are they
    • How long have they been a customer
    • Have they attended a training webinar or event.
  • The hypotheses within the business are that:
    • Customers that are new and young raise more support cases.
    • Customers with a specific set of products raise more support cases.
    • Customers from some specific regions are challenging and raise more support cases
    • Customers that are trained raise fewer support cases.

I want to test the validity of these hypotheses. The challenge is that these are non-mutually exclusive groups, so teasing out the relationships is challenging. Ultimately I want to create profiles (clusters I guess) that have different case generating behaviors.


Why? So that I can then go on to predict the case volumes I can expect if the number of customers within a specific profile increases in the future. 


Any help would be appreciated. 


Many thanks,






Do I need a forest model... maybe


You will likely need to run and analyze a handful of different statistical tools and their associated outputs to arrive at a concise answer or expression for your situation.


As you've noted, there are probably individual variable features (age might be a driver alone) and combination variable features (a certain product within a certain age group is particularly problematic perhaps). Different models are good at different things.


Your end result, will probably be some sort of cluster - that is what describes the confluence of features that truly drives at describing your situation.


I would recommend running lower level statistical models to remove the noise from your data set and using something like a k-means cluster for your final result, once you have removed the variables that aren't necessarily drivers.


Breaking this apart - I would look at correlation between individual factors and your predicted variable (call volume), then look at a forest model of the factors that survived the first step, then roll this up into a clustering model.


Long-story short, statistics if often a journey-like process, rarely a single step, but all of these steps and associated tools are available within the Alteryx suite.


Let me know if this helps!

I second @ZacharyM's comments.  Here are a few other tools to research:  Model Comparison (not installed by default, in Alteryx Gallery), MB Rules, and of course random forest and decision tree.