Alteryx Designer Desktop Ideas

marco_zara · ‎01-04-2019

Hello,

the randomforest package implementation in Alteryx works fine for smaller datasets but becomes very slow for large datasets with many features.

There is the opensource Ranger package https://arxiv.org/pdf/1508.04409.pdf that could help on this.

Along with XGBoost/LightGMB/Catboost it would be an extremely welcome addition to the predictive package!

J_Mortensen · ‎01-07-2019

I second your request for XGBoost to be added to the predictive tools.

Atabarezz · ‎01-09-2019

+1 great idea. I would mention @AshleyK @DrDan if we'd like to raise interest internally...

Random Forest (RIP Breiman) is a life saver in predictive and below benchmarks show how fast the new package is compared to existing package and some alternatives...

Dan2,.jpg

marco_zara · ‎01-09-2019

It might also be more productive to create a single topic for all R/Python packages we'd like to see in Alteryx or ones we'd like to improve.

DrDan · ‎01-09-2019

The ranger package definitely needs to be looked at. The randomForest package is the current R package we use that I'm least happy with in terms of its finicky behavior, plus, there have been a huge number of speed improvements for random forest models since the algorithm was first developed, while the randomForest package is based on Leo Breiman's and Adele Cutler's original (circa 2001) FORTRAN code. We did look at randomForestSRC a couple of years ago, but at that time, we found it was less performant than the original randomForest package.

Dan

DrDan · ‎01-09-2019

In terms of XGBoost, we also looked at that a couple of years ago as well, but there were implementation issues with it (it didn't work directly with data frames at that time).

marco_zara · ‎01-09-2019

Aside from the null value allergy and the 2GB model size limit (I use a lot of variables) I can't say Alteryx Random Forest implementation is that bad.

C5 decision tree is a lot more finicky (it's allergic to white spaces BOTH in variable names and data, this needs to be looked at) in my findings and the graphical output leaves a lot to be desired.

- As for XGBoost, perhaps the Python implementation would be easier to implement?

- Deep Forest (https://github.com/kingfengji/gcForest) would be an interesting package to implement as well, it's a tree-based alternative to Deep Learning.

- KNN and K-Modes (for categorical clustering) would be also great to have, the more options the merrier.

Atabarezz · ‎01-14-2019

big + for fixing null value allergy in random forest

which can be done with a few lines of code actually

++ for deep forest

looking forward to it
needs Alteryx to be able to utilize multi cores in parallel or GPU's maybe?

marco_zara · ‎01-15-2019

Unlike deep learning, deep forest uses layers of random forests so it doesn't require GPU to reach decent performance.

Atabarezz · ‎01-17-2019

You are true @marco_zara though it's a massively parrellizable algorithm.

when number of columns (variables) and rows increase it still takes a lot of time to model things...

recently a model of mine in a fintech takes approx 2 hours... long wait if you need to do near-realtime learning or active learning...

marco_zara · ‎01-17-2019

2 hours to train or score? Here I'm doing Churn Prediction models on a 4 year old I7 with 16GB of RAM, GPUs for machine learning are something in the fantasy realm especially as there is nobody that knows CUDA or OpenCL in my company. If it wasn't for Alteryx there is no way I'd be doing ML and we'd instead have to rely on consultants, so every new feature is welcome...

Alteryx Designer Desktop Ideas

Submitting an Idea?

Please implement the Ranger random forest package