11-28-2018 11:37 AM - edited 08-03-2021 11:16 AM
You've trained your model using historical data with a known outcome. (You've also done your due diligence by training many models, comparing them to each other, and selecting the best one.)
Now it's time to PREDICT, to use your model to score current data with an UNknown outcome. DON'T do the following:
Doing this will re-train the model every time you score new data, wasting time.
Instead, save the model to a yxdb file with an Output Data tool:
Then, in a separate workflow, read back in the saved model with an Input Data tool when scoring new data:
Two other ways to save a model:
Once saved to a yxdb file, is there a way to know how it was generated (what tool and settings)?
Thank YOU!
Gary
@vaughangary Yes - see this article.
Here's an example with a model built with the Forest Model tool:
Is there a way to output a model object that we built ourselves in an R developer tool?
I want to build my own regression object in R Developer, then output that model object to be used as the model in a score tool. (I'll be using linear regression first - I just want to speed up the Alteryx linear regression - but I'll also use Lasso, Boosting, and other models in the future).
Thanks,
Ray
@rmboaz I believe if you were to serialize the model object and output it the same way the Alteryx predictive tools do, you'd also need to modify the Score tool. For instance, if you open the Score tool and locate the relevant R tool, you'll see:
allowed.types <- c("glm", "svyglm", "negbin", "randomForest.formula", "rpart", "gbm", "lm", "rxLogit", "rxGlm", "rxLinMod", "rxDTree", "rxDForest", "earth", "naiveBayes", "svm.formula", "nnet.formula", "coxph", "elnet", "glmnet", "cv.glmnet", "C5.0")
That's just one line you'd need to modify. Might be easier to create your own custom Score tool from scratch for your new custom regression object.
Thanks for the quick response. I'm afraid I don't know what you mean by 'serialize' the model object. I tried to look at the output from the Alteryx Regression tool, but I didn't see/follow what it was outputting.
Ray
In computer science, in the context of data storage, serialization is the process of translating data structures or object state into a format that can be stored or transmitted and reconstructed later.
So in this context, turning the R model object into a format suitable for passing out of the R tool for later reading back into another R tool. I've adapted the code from the How To: Work With Custom R and Python Models in Alteryx Designer article to demonstrate one way to serialize and unserialize a model object and put it on the gallery here.You could then turn those R tools into training and scoring macros (tools) by following along with this series.
Hope that helps.
Awesome - thank you for your help