This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
So I have a number of different dimensions (or categories excuse my tableau speak) and I want to understand which combinations of these dimensions are important in driving a target metric. My sense is that my problem is with framing the question, so let me use an example and we might be able to figure out the detail:
Lets say I am the support department of a company:
I have a list of customers calling in and the number of support cases they raise in addition to some information about the customers:
What products they have purchased
How old they are
Where are they from
How eductaed are they
How long have they been a customer
Have they attended a training webinar or event.
The hypotheses within the business are that:
Customers that are new and young raise more support cases.
Customers with a specific set of products raise more support cases.
Customers from some specific regions are challenging and raise more support cases
Customers that are trained raise fewer support cases.
I want to test the validity of these hypotheses. The challenge is that these are non-mutually exclusive groups, so teasing out the relationships is challenging. Ultimately I want to create profiles (clusters I guess) that have different case generating behaviors.
Why? So that I can then go on to predict the case volumes I can expect if the number of customers within a specific profile increases in the future.
You will likely need to run and analyze a handful of different statistical tools and their associated outputs to arrive at a concise answer or expression for your situation.
As you've noted, there are probably individual variable features (age might be a driver alone) and combination variable features (a certain product within a certain age group is particularly problematic perhaps). Different models are good at different things.
Your end result, will probably be some sort of cluster - that is what describes the confluence of features that truly drives at describing your situation.
I would recommend running lower level statistical models to remove the noise from your data set and using something like a k-means cluster for your final result, once you have removed the variables that aren't necessarily drivers.
Breaking this apart - I would look at correlation between individual factors and your predicted variable (call volume), then look at a forest model of the factors that survived the first step, then roll this up into a clustering model.
Long-story short, statistics if often a journey-like process, rarely a single step, but all of these steps and associated tools are available within the Alteryx suite.