I would like to ask about Latencies related with custom R and Python codes in Alteryx.
Thanks!
hi @myriam I can speak to the R part of your questions at a high level. Yes, you will experience latency using the R tool with custom code, as R has a reputation for being slow already. The Alteryx engine will run the rest of the workflow quickly, but R runs its own code, so do all the prep and blend in Alteryx before the data stream hits the R tool.
The reasons will vary considerably based on your machine specs, what packages you're using, what your code looks like (e.g. looping with data.frames, read/ write formats, vectorisation), the size of your dataset - both length and especially width, etc. The best way to minimise latency is to have a big fast machine, optimise your R code, and don't use more data than you need.
Advanced R has sections on performance that might be useful. I'd love it if other citizen data scientists weighed in here, and please share what you've learned along the way!
I'm going to posit that with R it's not the latency per se - it's the process to error check. In R studio there's some fairly straight-forward stupid person error checking in that it lets you correct column/function/object names with slight typos. With Alteryx if you type something wrong (say "setseed" or "set-seed" versus "set.seed") it obviously throws an error and then you have to restart all over again. The problem is the boot time as opposed to the execution time. I'd love if R stayed in memory (so Alteryx knew that you were going to be running R code and didn't have to relaunch the R tool and its dependencies each time you run the workflow)... Now if I can just get JuliaCall to install in my Alter-R
Hi there
I developed my R code in RStudio, moved it across to the R Tool, made the required changes for it to work and then clicked run. My R code take 74 mins to run in RStudio and it is still running in Alteryx; I even managed to move house during that time too. Needless to say, I'm not very impressed. I know of ways in which I can optimise my code, like running parallel processes or using the data.table package (I hear it is much quicker than working with data.frame, but don't quote me on that one). Unfortunately I need to tidy the data in R rather than before it arrives in the R Tool, but tidying the data doesn't take long at all and that isn't where the latency lies with my code. The purpose behind my project is for one R Tool to process any of the business data that is thrown at it, I've also designed it so that a teammate with no R experience can do the same task that I can. A large part of my R code is error checking and I've created a custom error checking function to handle this, which works well, but it adds about 20% more lines to the for loops and if statements they feature in, which is many. The data doesn't comes from external sources, so I need R to alert my team to any errors in the data.
I think I will stop my Alteryx run and add some AlteryxMessage commands so I can see where the execution is at a given point.
Any help is greatly appreciated.