community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.

Controlling Oversampling Method

Asteroid

Dear Alteryx Community,

 

I am using the oversampling tool to tackle my imbalanced dataset and have a few questions.

 

  • Isn't this element supposed to be called undersampling, as rather majority class members are getting deleted. I tried modelling it other way around to see if minory class members would be created artificially to bring the dataset into a balance but I could not observe it. Thus, this elemens looks to me as undersampling. Any comments on that?
  • When I use the oversampling element, some part of the dataset gets deleted. In order to create statistically significant and reliable results, I need to do oversampling a few times such as 10x to obtain different results. I was expecting to see a seed value to set in order to control the oversampling effect but could not see that. How can I better control the oversampling method in Alteryx so that I can run different experiments with different sampling outcomes?

 

I appreciate your quick thoughts.

 

Best,

Atamert

 

 

 

 

Alteryx
Alteryx

Here is the help document on the oversample tool: https://help.alteryx.com/current/designer/oversample-field-tool

 

You are oversampling the underrepresented field. "For example, in the case of untargeted direct mail campaigns, it is not uncommon to find that 2% of potential prospects respond favorably to an appeal, while 98% do not. In this case, predictive models have a difficult time distinguishing the signal from the noise since the cost of classifying all potential prospects in the "no" category will nearly always be correct."

Asteroid

Dear @BrandonB,

 

Thank you for your reply, even though I had read that documentation

 

I am just trying to say that in order to make the portion of the underrepresented field in the whole dataset higher the oversampling element of Alteryx is deleting the entries of the majority class. To me this is rather undersampling.

 

Beside this not so important definition dilemma, I am actually rather interested in how I can control & see what is being deleted within the tool so that I can create different datasets.

 

Imagine you have 1000 'yes' and 100 'no' entries. In order to bring a 50%-50% balance Alteryx's oversampling element would delete 900 'yes' entries. At this point I would like to change the 900 entries that are being deleted in each iteration such that I obtain unique datasets with 100 'yes' and 100 'no' after each iteration. How can this be done in Alteryx with Oversampling or some other tool? This is for me the rather critical question.

 

Best,

Atamert

Labels