Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!
The Product Idea boards have gotten an update to better integrate them within our Product team's idea cycle! However this update does have a few unique behaviors, if you have any questions about them check out our FAQ.

Alteryx Designer Desktop Ideas

Share your Designer Desktop product ideas - we're listening!
Submitting an Idea?

Be sure to review our Idea Submission Guidelines for more information!

Submission Guidelines

Please implement the Ranger random forest package

Hello,

 

the randomforest package implementation in Alteryx works fine for smaller datasets but becomes very slow for large datasets with many features.

There is the opensource Ranger package https://arxiv.org/pdf/1508.04409.pdf that could help on this.

 

Along with XGBoost/LightGMB/Catboost it would be an extremely welcome addition to the predictive package!

16 Comments
J_Mortensen
5 - Atom

I second your request for XGBoost to be added to the predictive tools.

Atabarezz
13 - Pulsar

+1 great idea. I would mention @AshleyK @DrDan if we'd like to raise interest internally...

 

Random Forest (RIP Breiman) is a life saver in predictive and below benchmarks show how fast the new package is compared to existing package and some alternatives...

 

Dan2,.jpg

 

 

 

marco_zara
8 - Asteroid

It might also be more productive to create a single topic for all R/Python packages we'd like to see in Alteryx or ones we'd like to improve.

DrDan
Alteryx Alumni (Retired)

The ranger package definitely needs to be looked at. The randomForest package is the current R package we use that I'm least happy with in terms of its finicky behavior, plus, there have been a huge number of speed improvements for random forest models since the algorithm was first developed, while the randomForest package is based on Leo Breiman's and Adele Cutler's original (circa 2001) FORTRAN code. We did look at randomForestSRC a couple of years ago, but at that time, we found it was less performant than the original randomForest package.

 

Dan

DrDan
Alteryx Alumni (Retired)

In terms of XGBoost, we also looked at that a couple of years ago as well, but there were implementation issues with it (it didn't work directly with data frames at that time).

marco_zara
8 - Asteroid

Aside from the null value allergy and the 2GB model size limit (I use a lot of variables) I can't say Alteryx Random Forest implementation is that bad.

C5 decision tree is a lot more finicky (it's allergic to white spaces BOTH in variable names and data, this needs to be looked at) in my findings and the graphical output leaves a lot to be desired.

 

- As for XGBoost, perhaps the Python implementation would be easier to implement?

 

- Deep Forest (https://github.com/kingfengji/gcForest) would be an interesting package to implement as well, it's a tree-based alternative to Deep Learning.

 

- KNN and K-Modes (for categorical clustering) would be also great to have, the more options the merrier.

Atabarezz
13 - Pulsar

big + for fixing null value allergy in random forest

  • which can be done with a few lines of code actually

++ for deep forest

  • looking forward to it
  • needs Alteryx to be able to utilize multi cores in parallel or GPU's maybe?

 

marco_zara
8 - Asteroid
Unlike deep learning, deep forest uses layers of random forests so it doesn't require GPU to reach decent performance.
Atabarezz
13 - Pulsar

You are true @marco_zara though it's a massively parrellizable algorithm.

when number of columns (variables) and rows increase it still takes a lot of time to model things...

 

recently a model of mine in a fintech takes approx 2 hours... long wait if you need to do near-realtime learning or active learning...

marco_zara
8 - Asteroid
2 hours to train or score? Here I'm doing Churn Prediction models on a 4 year old I7 with 16GB of RAM, GPUs for machine learning are something in the fantasy realm especially as there is nobody that knows CUDA or OpenCL in my company. If it wasn't for Alteryx there is no way I'd be doing ML and we'd instead have to rely on consultants, so every new feature is welcome...