The Product Idea boards have gotten an update to better integrate them within our Product team's idea cycle! However this update does have a few unique behaviors, if you have any questions about them check out our FAQ.

Alteryx Designer Desktop Ideas

Share your Designer Desktop product ideas - we're listening!
Submitting an Idea?

Be sure to review our Idea Submission Guidelines for more information!

Submission Guidelines

Please implement the Ranger random forest package

Hello,

 

the randomforest package implementation in Alteryx works fine for smaller datasets but becomes very slow for large datasets with many features.

There is the opensource Ranger package https://arxiv.org/pdf/1508.04409.pdf that could help on this.

 

Along with XGBoost/LightGMB/Catboost it would be an extremely welcome addition to the predictive package!

16 Comments
Atabarezz
13 - Pulsar

To train (400 K rows X 1900 variables). Scoring is always lightning fast no problem there...

marco_zara
8 - Asteroid
1900 variables is pretty huge, I thought my 300ish datasets were too big already... I can see why you need faster random forests!
Atabarezz
13 - Pulsar

Well I turned it into an algorithm to make faster runs...

 

1. I run a small sample %5 with full variable set, make sure equal ratio of all labels to classify

2. Select %50 of the meaningful variables from random forest

3. Re run with a bigger sample X2 -> %10 with half the variables

 

in 2-3 iterations roughly know the prime variables needed and an ideal sub-sample size...

marco_zara
8 - Asteroid
Can we hope from some feedback from the devs? It's sad to see most ML algorithms depending on single core speed in 2019, especially as the hardware to make them perform best is not something corporate procurement would approve...
AlteryxCommunityTeam
Alteryx Community Team
Alteryx Community Team
Status changed to: Accepting Votes
 
mojgan1987
6 - Meteoroid

I fully support the proposal of adding LightGBM, XGBoost and CatBoost.