Be sure to review our Idea Submission Guidelines for more information!
Submission GuidelinesIdea:
Some well known scoring methods use optimal binned variables for added robustness. Let's add this capability to Alteryx.
Retionale:
Here's a basic link on why to do that; http://documents.software.dell.com/statistics/textbook/optimal-binning
Current status in Alterys as I'm aware of:
Tile tool or Multi-field Binning tool for completing same task as Tile tool on multiple fields, splits the variables by 5 methods;
Equal Records or Intervals or Sums
Smart Tile
Unique Value
Manual
Unfortunately "equal something" binnings are bad idea, as the values are categorized "blindly" irrespective of the effects on the predictive power of the models.
What to do:
What's needed is to bin both numerical and categorical variables optimally such that the Weights of Evidences (WoE) should present a monotone increasing or decreasing pattern. Maybe at most a V or U shaped "convex" structure.
Quick win:
Without constraining ourselves with monotonicity or convex cases, the easiest practice would be running a C4.5 or CHAID tree algorithm (produces multiple splits rather than binary splits in CART) for a single variable and select the target as the dependent variable and all the resulting nodes will be the bins we are looking for. Doing this for multiple variables at once is the key to the tool to be generated.
Clients:
This capability is sought by risk management departments building robust, stable Basel compliant models in financial industry, especially by banks.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.