Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Multi threading in R predictive tools

kcmBens
5 - Atom

Hello!

 

I am running various machine learning models in both Alteryx and R Studio. Although I prefer by far Alteryx because of the fact that in one workflow I can prepare the data for predictive analysis, train a model and test it out of sample, I am able to leverage my server virtual environment better using R with the foreach and doParallel packages. Alteryx seems to only use one R worker and so I was curious to know whether there is a way for me to obtain a similar behavior in alteryx where the job is split evenly over many workers and results combined in the end.

For instance, I used RandomForest and for each manual iteration, a set of trees was built separately on many workers and recombined into one model using the combine function in the randomForest package. Can I obtain a similar thing in Alteryx?

 

I have 32GB of memory and 8 cores available that I'd like to take advantage of. 

 

Thanks!!

3 REPLIES 3
KevinP
Alteryx Alumni (Retired)

@kcmBens Alteryx utilizes multi-thread processing in most places where it makes sense to do so. This includes: most Spatial Tools, CASS, Sort, Unique, Download, and the Join Tools among others. However, there are a number of tools that are not multi-threaded. In most cases this is because the tools are not CPU intensive, and actually perform worse due to the overhead involved when making them multi-threaded. A good example of this is the Formula Tool. While it can be (and at one time was) multi-threaded we found that the overhead of multiple threads was higher then the time it took to evaluate the formulas. Specifically, in regard to the Predictive tools when utilizing Revolution the tools are multi-threaded because the R packages use a multi-threaded MKL. For the standard open source R package though only the Boosted Model tool is currently multi-threaded. This may change in the future though as development has recognized that other predective tools could benefit from being multi-threaded too.

kcmBens
5 - Atom

Thank you @KevinP. One last question please, is there any configuration panel in alteryx which would allow me to adjust the behavior. I want Alteryx on my virtual server environment to utilize all of the available resources (of course if it doesn't already do so).

 

Thanks again for your help.

KevinP
Alteryx Alumni (Retired)

@kcmBens There are some options for memory usage and thread count in the Alteryx System Settings in the Engine > General section. However, these options are usually already set to the correct values based on your CPU and RAM resources. As a note the 'Default number of processing threads' option should be set to the number of processor cores plus 1. For example if you have an 8 core processor it should be set to 9. The memory options on this configuration screen are 'Default sort/join memory usage (MB)' and 'Memory limit per Anchor (KB)'. The first option determines the minimium amount of memory used with performing sort join operations. We usually do not recommend adjusting this value unless available memory on the server/workstation is limited. In which case we usually decrease the value based on the amount a available RAM. The 'Memory limit per Anchor (KB)' option is used to determine how much memory a tool anchor can use to stor Browse Everywhere data. This is used to display result data after each tool. The amount of memory per anchor influences how much result data can be stored/displayed. Increaseing the value increase the number of rows and fields that can shown, but keep in mind that this value is per tool anchor so the more tools you have and the more connections per tool the more memory that is used by this functionality. As such we recommend keeping the value pretty small.

Labels