Be sure to review our Idea Submission Guidelines for more information!
Submission GuidelinesHello!
A quite minor, pedantic issue from me today.
Currently, the Oversample Field Tool's naming and configuration suggest that the tool can over sample data:
However, I would argue the tool under samples data instead.
Here are a few sources that explain this much better than I can:
And an image is taken from Medium:
Effectively either step is to create a similar (or same) number of records between each class. Under sampling is the process of taking samples from the majority class, and ending up with a smaller dataset than started with. Over sampling is the process of duplicating records within the minority class, and creates a larger dataset.
When using the Oversample tool within Alteryx, using the example workflow for reference:
When summarizing the input:
And the output:
It's clear that the data has actually been under sampled, in that random samples have been taken from the majority class to match the minority, rather than creating duplicate minority records.
I would suggest a quick renaming of the tool to "Undersample Field Tool", and documentation to not cause confusion to new users of the platform.
Kind Regards,
TheOC
Sie müssen ein registrierter Benutzer sein, um hier einen Kommentar hinzuzufügen. Wenn Sie sich bereits registriert haben, melden Sie sich bitte an. Wenn Sie sich noch nicht registriert haben, führen Sie bitte eine Registrierung durch und melden Sie sich an.