Be sure to review our Idea Submission Guidelines for more information!
Submission GuidelinesHello!
A quite minor, pedantic issue from me today.
Currently, the Oversample Field Tool's naming and configuration suggest that the tool can over sample data:
However, I would argue the tool under samples data instead.
Here are a few sources that explain this much better than I can:
And an image is taken from Medium:
Effectively either step is to create a similar (or same) number of records between each class. Under sampling is the process of taking samples from the majority class, and ending up with a smaller dataset than started with. Over sampling is the process of duplicating records within the minority class, and creates a larger dataset.
When using the Oversample tool within Alteryx, using the example workflow for reference:
When summarizing the input:
And the output:
It's clear that the data has actually been under sampled, in that random samples have been taken from the majority class to match the minority, rather than creating duplicate minority records.
I would suggest a quick renaming of the tool to "Undersample Field Tool", and documentation to not cause confusion to new users of the platform.
Kind Regards,
TheOC
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.