ALTERYX INSPIRE | Join us this May for for a multi-day virtual analytics + data science experience like no other! Register Now
The Alteryx Community will be temporarily unavailable for a time due to scheduled maintenance on Thursday, April 22nd. Please plan accordingly.

Alteryx Designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
SOLVED

New assisted modelling (ML tools) oversampling

paul_houghton
11 - Bolide

I have a rather imbalanced (5% positive values), conversion dataset that I'm trying to use the new ML tools to predict the conversion targets.

 

Traditionally, I would oversample the dataset to balance the target variable, but the assisted modelling doesnt appear to balance the target and just predicts everything as negative for very high accuracy.

 

Does the assisted modelling have any way of treating or managing imbalanced datasets for classification at all?

DavidSta
Alteryx
Alteryx

Hi @paul_houghton,

 

you can place the oversampling tool directly before making use of the assisted modeling as it's not directly included in the new toolset.

You can use all the known tools you like in front of the AM tools and afterwards.

 

Best regards,

David

 

paul_houghton
11 - Bolide

I see, normally when the oversampling tool there is a correction factor
added in the score tool (at least there is in the old R based score tool),
is that no needed?

[EDIT: added back the Image]

paul_houghton_0-1593593908409.png

 

DavidSta
Alteryx
Alteryx

Assisted Modeling doesn’t address class imbalances currently, like what you are looking for in the "Predict Values" Tool when comparing it with the "Score" Tool.

For most imbalances that isn’t too severe, you can manually fix them using the Sample tool like mentioned above.

paul_houghton
11 - Bolide

Hey @DavidSta so I understand using the predict rather than the score tool for the ML tools. In my example the imbalance is 95%-5% does this mean that the existing tools would not be able to address this imbalance (even using the oversampling as mentioned) as there is no way to account for that adjustment and there for account for that adjustment?

DavidSta
Alteryx
Alteryx

Exactly, the Assisted Modeling currently can not address this imbalance as this is only compatible with the "Predict" Tool and not the "Score" Tool (Predict is Python based and Score is R based).

Currently you can only with the existing Predictive Tools (the brown ones) do the scoring adjustments you are looking for.

Labels