Hi all
I'm having a lot of difficulty understanding a problem that I face. I've developed a macro centred on the R caret package, which creates a parallel processing cluster using the doParallel package. input$ncores is read from an interface tool and the maximum number of cores allocated in the cluster is the maximum available on the machine... detectCores(). I setup a cross-validation environment and caret uses the cluster to work through it.
library(doParallel)
cl <- makePSOCKcluster(min(detectCores(), input$ncores))
registerDoParallel(cl)
The problem I face is the memory allocation to the processes my macro initiates. When I run the diamonds dataset (from ggplot2), which is only 3.3 MB, with 10 cross-validation folds, I get a massive hike in memory usage - more than what my system has, which is 32 GB RAM. However, I can run the same code in RStudio* and my system RAM usage hovers at a comfortable 6 GB RAM. The error message I get is
unserialize(socklist[[n]]) : error reading from connection
I've really trimmed down my macro. It is simply two input tools, the R tool and one output tool. All the functionality of other tools (like Select) are done with a few lines of R code.
Does anybody know why Alteryx soaks up so much more RAM than R itself? Thanks for your time.
*Nearly the same. I use fread() from data.table to read the diamonds dataset from a CSV file. I also change my user-defined input objects so that the AlteryxPredictive suite of *Input() functions are used differently by amending the default.