I have a rather imbalanced (5% positive values), conversion dataset that I'm trying to use the new ML tools to predict the conversion targets.
Traditionally, I would oversample the dataset to balance the target variable, but the assisted modelling doesnt appear to balance the target and just predicts everything as negative for very high accuracy.
Does the assisted modelling have any way of treating or managing imbalanced datasets for classification at all?
Solved! Go to Solution.
Hi @paul_houghton,
you can place the oversampling tool directly before making use of the assisted modeling as it's not directly included in the new toolset.
You can use all the known tools you like in front of the AM tools and afterwards.
Best regards,
David
I see, normally when the oversampling tool there is a correction factor
added in the score tool (at least there is in the old R based score tool),
is that no needed?
[EDIT: added back the Image]
Assisted Modeling doesn’t address class imbalances currently, like what you are looking for in the "Predict Values" Tool when comparing it with the "Score" Tool.
For most imbalances that isn’t too severe, you can manually fix them using the Sample tool like mentioned above.
Hey @DavidSta so I understand using the predict rather than the score tool for the ML tools. In my example the imbalance is 95%-5% does this mean that the existing tools would not be able to address this imbalance (even using the oversampling as mentioned) as there is no way to account for that adjustment and there for account for that adjustment?
Exactly, the Assisted Modeling currently can not address this imbalance as this is only compatible with the "Predict" Tool and not the "Score" Tool (Predict is Python based and Score is R based).
Currently you can only with the existing Predictive Tools (the brown ones) do the scoring adjustments you are looking for.