Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

How to Predict Outcome that's Never Occurred

Tronido
5 - Atom

Hello - I've been tasked to predict potential locations at risk for union activity.  I have created a data set with ~120 locations and included 25-30 different data points that  potentially contribute.  Unfortunately, this event has never occurred.  Could anyone recommend the best approach to tackle this problem?  I've created a few models in the past, but I've always had an outcome variable to rely on.  Any help is greatly appreciated.  Thanks!

2 REPLIES 2
fmvizcaino
17 - Castor
17 - Castor

Hi @Tronido ,

 

I'm not an expert in this matter, but I think you need to look into unsupervised machine learning, for instance, clustering models where you group your data by statistical similarities. 

 

Depending on your data and how many cluster you've generated, one of them can represent the risk of union activity. Of course that this is done iteratively until you find groups that represent something to you

 

https://community.alteryx.com/t5/Alteryx-Designer-Knowledge-Base/Tool-Mastery-K-Centroids-Cluster-An...

https://towardsdatascience.com/unsupervised-learning-and-data-clustering-eeecb78b422a

 

Best,

Fernando Vizcaino

danilang
19 - Altair
19 - Altair

Hi @Tronido 

 

For any kind of predictive model/machine learning to set the probability of an event, the event has to have occurred at least once.  Without a sample picture of a cat or at least the concept of a cat,  i.e. four legs, fur, long tail, all that you can do is create categories of input, some that may approximate a cat.  But the model won't be able to say its a cat. 

 

For your case I think you need to look outside your organization and include some data from companies where unions have been formed, i.e. use someone else's picture of a cat.   Without this you can look at some measure of what makes the formation of a union possible.  Possible factors may be # of employee complaints, salary rankings compared to other similar companies, etc.   Clustering on these factors may highlight locations that are more likely to unionize i.e. Location A is more likely than Location B, but without the positive cases, you won't be able to say how much more likely or any kind of absolute measure like Location has a 50% change to unionize in the next year.

 

Dan

 

 

 

 

 

Labels