Alteryx Machine Learning Discussions

Find answers, ask questions, and share expertise about Alteryx Machine Learning.
Getting Started

Start your learning journey with Alteryx Machine Learning Interactive Lessons

Go to Lessons

R caret package in R tool - RAM sky rockets when using parallel processing cluster.

mjf
8 - Asteroid

Hi all

 

I'm having a lot of difficulty understanding a problem that I face. I've developed a macro centred on the R caret package, which creates a parallel processing cluster using the doParallel package. input$ncores is read from an interface tool and the maximum number of cores allocated in the cluster is the maximum available on the machine... detectCores(). I setup a cross-validation environment and caret uses the cluster to work through it.

 

library(doParallel)
cl <- makePSOCKcluster(min(detectCores(), input$ncores))
registerDoParallel(cl)

 

The problem I face is the memory allocation to the processes my macro initiates. When I run the diamonds dataset (from ggplot2), which is only 3.3 MB, with 10 cross-validation folds, I get a massive hike in memory usage - more than what my system has, which is 32 GB RAM. However, I can run the same code in RStudio* and my system RAM usage hovers at a comfortable 6 GB RAM. The error message I get is

 

unserialize(socklist[[n]]) : error reading from connection

 

I've really trimmed down my macro. It is simply two input tools, the R tool and one output tool. All the functionality of other tools (like Select) are done with a few lines of R code.

 

Does anybody know why Alteryx soaks up so much more RAM than R itself? Thanks for your time.

 

*Nearly the same. I use fread() from data.table to read the diamonds dataset from a CSV file. I also change my user-defined input objects so that the AlteryxPredictive suite of *Input() functions are used differently by amending the default.

0 REPLIES 0