Start your journey with Alteryx Machine Learning - Take our Interactive Lesson today!

Alteryx Machine Learning Discussions

Find answers, ask questions, and share expertise about Alteryx Machine Learning.
Getting Started

Start your learning journey with Alteryx Machine Learning Interactive Lessons

Go to Lessons

R caret package in R tool - RAM sky rockets when using parallel processing cluster.

mjf
8 - Asteroid

Hi all

 

I'm having a lot of difficulty understanding a problem that I face. I've developed a macro centred on the R caret package, which creates a parallel processing cluster using the doParallel package. input$ncores is read from an interface tool and the maximum number of cores allocated in the cluster is the maximum available on the machine... detectCores(). I setup a cross-validation environment and caret uses the cluster to work through it.

 

library(doParallel)
cl <- makePSOCKcluster(min(detectCores(), input$ncores))
registerDoParallel(cl)

 

The problem I face is the memory allocation to the processes my macro initiates. When I run the diamonds dataset (from ggplot2), which is only 3.3 MB, with 10 cross-validation folds, I get a massive hike in memory usage - more than what my system has, which is 32 GB RAM. However, I can run the same code in RStudio* and my system RAM usage hovers at a comfortable 6 GB RAM. The error message I get is

 

unserialize(socklist[[n]]) : error reading from connection

 

I've really trimmed down my macro. It is simply two input tools, the R tool and one output tool. All the functionality of other tools (like Select) are done with a few lines of R code.

 

Does anybody know why Alteryx soaks up so much more RAM than R itself? Thanks for your time.

 

*Nearly the same. I use fread() from data.table to read the diamonds dataset from a CSV file. I also change my user-defined input objects so that the AlteryxPredictive suite of *Input() functions are used differently by amending the default.

0 REPLIES 0