We’ve extended Inspire Early Bird Pricing until March 1. Register now and enjoy 20% off conference passes and 10% off training passes. P.S. Don’t forget to bring friends! When you sign up for five or more tickets, you get an extra 20% discount on conference passes.
Learn more now.
Effectively either step is to create a similar (or same) number of records between each class. Under sampling is the process of taking samples from the majority class, and ending up with a smaller dataset than started with. Over sampling is the process of duplicating records within the minority class, and creates a larger dataset.
When using the Oversample tool within Alteryx, using the example workflow for reference:
When summarizing the input:
And the output:
It's clear that the data has actually been under sampled, in that random samples have been taken from the majority class to match the minority, rather than creating duplicate minority records.
I would suggest a quick renaming of the tool to "Undersample Field Tool", and documentation to not cause confusion to new users of the platform.