Alteryx Designer Ideas

Share your Designer product ideas - we're listening!
Submitting an Idea?

Be sure to review our Idea Submission Guidelines for more information!

Submission Guidelines
Don't forget to submit your entry for the Excellence Awards by October 30! | Need more information about the program? Check out the blog here

Please implement the Ranger random forest package



the randomforest package implementation in Alteryx works fine for smaller datasets but becomes very slow for large datasets with many features.

There is the opensource Ranger package that could help on this.


Along with XGBoost/LightGMB/Catboost it would be an extremely welcome addition to the predictive package!

5 - Atom

I second your request for XGBoost to be added to the predictive tools.

Alteryx Partner

+1 great idea. I would mention @AshleyK @DrDan if we'd like to raise interest internally...


Random Forest (RIP Breiman) is a life saver in predictive and below benchmarks show how fast the new package is compared to existing package and some alternatives...






Alteryx Partner

It might also be more productive to create a single topic for all R/Python packages we'd like to see in Alteryx or ones we'd like to improve.


The ranger package definitely needs to be looked at. The randomForest package is the current R package we use that I'm least happy with in terms of its finicky behavior, plus, there have been a huge number of speed improvements for random forest models since the algorithm was first developed, while the randomForest package is based on Leo Breiman's and Adele Cutler's original (circa 2001) FORTRAN code. We did look at randomForestSRC a couple of years ago, but at that time, we found it was less performant than the original randomForest package.




In terms of XGBoost, we also looked at that a couple of years ago as well, but there were implementation issues with it (it didn't work directly with data frames at that time).

Alteryx Partner

Aside from the null value allergy and the 2GB model size limit (I use a lot of variables) I can't say Alteryx Random Forest implementation is that bad.

C5 decision tree is a lot more finicky (it's allergic to white spaces BOTH in variable names and data, this needs to be looked at) in my findings and the graphical output leaves a lot to be desired.


- As for XGBoost, perhaps the Python implementation would be easier to implement?


- Deep Forest ( would be an interesting package to implement as well, it's a tree-based alternative to Deep Learning.


- KNN and K-Modes (for categorical clustering) would be also great to have, the more options the merrier.

Alteryx Partner

big + for fixing null value allergy in random forest

  • which can be done with a few lines of code actually

++ for deep forest

  • looking forward to it
  • needs Alteryx to be able to utilize multi cores in parallel or GPU's maybe?


Alteryx Partner
Unlike deep learning, deep forest uses layers of random forests so it doesn't require GPU to reach decent performance.
Alteryx Partner

You are true @marco_zara though it's a massively parrellizable algorithm.

when number of columns (variables) and rows increase it still takes a lot of time to model things...


recently a model of mine in a fintech takes approx 2 hours... long wait if you need to do near-realtime learning or active learning...

Alteryx Partner
2 hours to train or score? Here I'm doing Churn Prediction models on a 4 year old I7 with 16GB of RAM, GPUs for machine learning are something in the fantasy realm especially as there is nobody that knows CUDA or OpenCL in my company. If it wasn't for Alteryx there is no way I'd be doing ML and we'd instead have to rely on consultants, so every new feature is welcome...