Hi everyone,
I get errors when running these models and do not understand the reason for the errors. Have done some searching, but the adjustments I have done thus far does not help that much. The data I use is 158MB, about 1.9mil records (1.3mil for evaluation for the models). I am training the data to determine if an order converted or not. The error messages and fields I use are as follows:
Account_categories, quote_day_of_date, geo, region, max_quoted_amount. These variable sums up the types of fields I have in the data, 3 interger types and 7 V_WString / string types. I am using a total of 10 fields and am training 2 models for each algorithm to see how the model performs on less variables and all variables.
I also share the R code I used to determine the variable importance and an image of the flow. (however I think the error is due to the error with the model):
I would appreciate any expertise advice on the errors I currently have.
Thank you!
Hi @Roche
I don't have any experience with this error, but I'll try and help troubleshoot. I would start out by testing your workflow with a small sample (maybe a random 10%?) to if there's something about the configuration is causing problem, or if it is just a capacity problem with the volume of records.
Let us know how that goes.
Hi @CharlieS
Have run the random forest models with 10% of the data and no errors. Will let you know about the other models.
Thank you!
Hi @Roche
The random forest algorithm can take a lot of your memory, you can check your memory limit using the syntax memory.limit() in R.
One note that R could use disk as memory, you could use memory.limit(100000) --> it will give you ~100GB of memory.
But this won't help you if you don't have that much of hard drive space.
Hi @CathyS_Slalom , thank you for your advice. If I give this command in R, how does it then become applicable within Alteryx's R code? I do think that the problem is the memory.
Hi @Roche, you would need to run the command in Alteryx.
1. Write your own code and run the random forest in R or Python from Developer tab
2. Modify the R from Forest_Model.yxmc
Hope this helps.
Hi, thank you. I have done so as seen in the image:
I am testing to see what the limit should be. Have used the 100 000 limit, but it yields no data, neither 25 000 or 10 000 or placing the R tool after the 'create sample' tool with limits to 7000 and 3000. If I test for the limit the forest tool gives a message "at least one predictor variable should be selected".
Not sure what I am doing wrong. Can you perhaps help me with this?
Thank you!
Rouche
Hi @CharlieS , I have been able to run 2 of the models successfully, but still struggle with the Random Forest model. At 10% it does run, but not at 50% or 40%. The other models have everything the same ito variables, data cleaning etc. I have also tried to change the memory settings in the workflow and am also using AMP engine, but it does not work.
Just to double check, have you tried not using the AMP engine?
Hi @CharlieS, I have run it with 50% of the data with AMP and without AMP. Have just run it again (to be sure new configurations are taken into account in the new results) and it gives the same main problem every time.
Here is the image of the R code that I used for the memory limit as well.