Dear Community,
Probably a simple question for advanced users but not for a young Padawan like me ^^
I have a file with 5000 lines, out of which more than 85% have a “NOK” output and 15% are “OK” (quite unbalanced). To properly train my model, I would like to feed it with a balanced sampling (50% NOK & 50% OK).
I start by isolating the “OK” lines through a filter and I use a random sampling on the “NOK” data.
The thing is that I need to manually define what should be the size of “NOK” samples I am looking for… according to the number of “OK”.
So I would like to use the number of “OK” samples as an input to request the same “NOK” number :
I have found replies with some inputs but more for batches than for a simple workflow.
Thanks,
Pierre-Louis
Solved! Go to Solution.
Hi Pierre-Louis,
You probably want to have a look at the oversampling tool. Sounds like the exact result you are after.
https://help.alteryx.com/2018.4/Oversample_Field.htm
Thanks a lot @paul_houghton! The option was not ticked on my Alteryx update... so I was not aware of this option :-)
No problem there are a lot of tools in alteryx so knowing which one works best in a situation can be a challenge. Glad that helped.
User | Count |
---|---|
38 | |
32 | |
8 | |
7 | |
7 |