Hello - I've been tasked to predict potential locations at risk for union activity. I have created a data set with ~120 locations and included 25-30 different data points that potentially contribute. Unfortunately, this event has never occurred. Could anyone recommend the best approach to tackle this problem? I've created a few models in the past, but I've always had an outcome variable to rely on. Any help is greatly appreciated. Thanks!
Hi @Tronido ,
I'm not an expert in this matter, but I think you need to look into unsupervised machine learning, for instance, clustering models where you group your data by statistical similarities.
Depending on your data and how many cluster you've generated, one of them can represent the risk of union activity. Of course that this is done iteratively until you find groups that represent something to you
https://towardsdatascience.com/unsupervised-learning-and-data-clustering-eeecb78b422a
Best,
Fernando Vizcaino
Hi @Tronido
For any kind of predictive model/machine learning to set the probability of an event, the event has to have occurred at least once. Without a sample picture of a cat or at least the concept of a cat, i.e. four legs, fur, long tail, all that you can do is create categories of input, some that may approximate a cat. But the model won't be able to say its a cat.
For your case I think you need to look outside your organization and include some data from companies where unions have been formed, i.e. use someone else's picture of a cat. Without this you can look at some measure of what makes the formation of a union possible. Possible factors may be # of employee complaints, salary rankings compared to other similar companies, etc. Clustering on these factors may highlight locations that are more likely to unionize i.e. Location A is more likely than Location B, but without the positive cases, you won't be able to say how much more likely or any kind of absolute measure like Location has a 50% change to unionize in the next year.
Dan