Alteryx Designer Desktop Ideas

TheOC · ‎08-25-2022

Hello!
A quite minor, pedantic issue from me today.

Currently, the Oversample Field Tool's naming and configuration suggest that the tool can over sample data:

However, I would argue the tool under samples data instead.

Here are a few sources that explain this much better than I can:

And an image is taken from Medium:

Effectively either step is to create a similar (or same) number of records between each class. Under sampling is the process of taking samples from the majority class, and ending up with a smaller dataset than started with. Over sampling is the process of duplicating records within the minority class, and creates a larger dataset.

When using the Oversample tool within Alteryx, using the example workflow for reference:

When summarizing the input:

And the output:

It's clear that the data has actually been under sampled, in that random samples have been taken from the majority class to match the minority, rather than creating duplicate minority records.

I would suggest a quick renaming of the tool to "Undersample Field Tool", and documentation to not cause confusion to new users of the platform.

Kind Regards,

TheOC

IraWatt · ‎08-25-2022

Great spot @TheOC ! 😄

AlteryxCommunityTeam · ‎09-16-2022

cgoodman3 · ‎10-27-2022

How about also adding in functionality to give the user the ability to chose whether they want to over or under sample their data?

TheOC · ‎10-27-2022

Love the idea - I propose a dilemma... what do you then call the tool that can both over and under-sample data? 😂

JamieHankins · ‎04-25-2023

I would also like a tool that provides the option of under or over sampling

Alteryx Designer Desktop Ideas

Submitting an Idea?

Correctly Name The Oversample Field Tool