Correctly Name The Oversample Field Tool

A quite minor, pedantic issue from me today. 


Currently, the Oversample Field Tool's naming and configuration suggest that the tool can over sample data:


However, I would argue the tool under samples data instead.

Here are a few sources that explain this much better than I can:

And an image is taken from Medium:


Effectively either step is to create a similar (or same) number of records between each class. Under sampling is the process of taking samples from the majority class, and ending up with a smaller dataset than started with. Over sampling is the process of duplicating records within the minority class, and creates a larger dataset.


When using the Oversample tool within Alteryx, using the example workflow for reference:


When summarizing the input:


And the output:


It's clear that the data has actually been under sampled, in that random samples have been taken from the majority class to match the minority, rather than creating duplicate minority records. 

I would suggest a quick renaming of the tool to "Undersample Field Tool", and documentation to not cause confusion to new users of the platform.


Kind Regards,


17 - Castor
Great spot @TheOC ! 😄

Alteryx Community Team
Status changed to: Accepting Votes
14 - Magnetar
How about also adding in functionality to give the user the ability to chose whether they want to over or under sample their data?


15 - Aurora
Love the idea - I propose a dilemma... what do you then call the tool that can both over and under-sample data? 😂

7 - Meteor

I would also like a tool that provides the option of under or over sampling