Solved! Go to Solution.
Hi,
I'm confused as to how this works specifically. What are you parsing by when you add the text/column function?? The report that comes out from that output is not broken down in a way where you can easily parse out what variable is tied to what coefficient, etc.
This seems like a pretty basic functionality that's missing in Alteryx. You would think it's obvious that we want an easy way to use the regression output for further calculations (e.g., predicting what model says would be variable given x, y, z, etc.) Am I missing something?
I have this macro in my old email address which has gone, but the solution is akin to this one:
http://community.alteryx.com/t5/Advanced-Analytics/Spline-Tool-Output/td-p/325
Perhaps @DrDan can post the relevant R code we can turn into a macro as this link doesn't work for Linear Regression.
Chris
Attached is the macro to which Chris is referring. The input to the macro is an Alteryx data stream containing a model created by one of the standard Alteryx predictive modeling tools (e.g., Linear Regression) and the output is a table that consists of the names of the model coefficients along with the corresponding coefficient values. In addition, below is the R code within the macro (with added comments):
# Read in the Alteryx stream containing the model object
the.model <- read.Alteryx("#1")
# Make sure that it is in fact a model object, and return an error if it is not
if (!all(names(the.model) %in% c("Name", "Object")))
stop.Alteryx("A model object was not provided")
# The model object has been serialized as a string, which on being read from Alteryx is
# converted to an R factor. The line of code first coerces the serialized model string to
# a string (in R a "character" data type) and the unserialized the model object
mod.obj <- unserializeObject(as.character(the.model$Object[1]))
# From the first test error check we know the Alteryx stream contained a model, but not
# all models have coefficients, so test to see if there are model coefficients, and throw an
# error otherwise
if (!is.null(mod.obj$coefficients)) {
the.coefs <- mod.obj$coefficients
write.Alteryx(data.frame(Variable = names(the.coefs), Coefficient = the.coefs))
} else {
stop.Alteryx(paste("A model of class", class(mod.obj)[1], "does not have coefficients"))
}
This code and the corresponding macro will work for the Linear, Logistic, Count, and Gamma Regression tools. However, if the objective is to calculate predicted values from the model, then extracting the coefficients and using them in a formula tool (or in some other tool) is strongly discouraged. The reasons for this are: (1) the process of translating a set of model coefficients to a formula is often prone to serious error (particularly when categorical variables are used) and (2) doing this is likely to be a waste of time on the part of the user since the Score tool will acurately calculate fitted values at the very small cost of dragging one additional tool onto the canvas.
It turns out you can expose anything related to an R model object to Alteryx using an R tool, whether that is needed, or is even a good choice, is another matter.
Hello DrDan,
Thanks for the macro. it is very helpful.
Is there any way to fetch the values of "R square", "Adjusted R square" and the "p-values of the corresponding coefficients" as well in the macro you have provided. I tried, however no luck.
Request you to kindly help for the same.
Thanks,
Jai
@jaituteja, sorry for a somewhat slow response. However, I do have a fairly complete one.
The issue you are likely running into is that the quantities you want are not stored in the model object, they are calculated by the summary methods for the linear regression model object. It turns out that the summary method also produces an object (it turns out everything in R is an object). One trick is R-squared and adjusted R-squared are scalars, while the p-values are a vector, so getting them output to Alteryx properly is a bit tricky. In the code below, I just don't extract the p-values, but the full coefficient summary table and write it to one Alteryx table (you can select only the p-values using a select tool in Alteryx), and then I get the R-squared and adjusted R-Squared values and write them to another table. Here is the R code that goes into an R tool:
# Read in the data stream containing the R lm model object and unserialize the object
mod_df <- read.Alteryx("#1")
mod_obj <- unserializeObject(as.character(mod_df$Object[1]))
# Get the summary object for the model, all the statistics of interest are in the summary
the_summary <- summary(mod_obj)
# Prepare the full coefficient results table, which includes the p-values
coef_table <- as.data.frame(the_summary$coefficients)
# Add the coefficient names to the table
coef_table <- cbind(data.frame(var = row.names(coef_table)), coef_table)
# Provide R friendly names to the table fields
names(coef_table) <- c("Variable", "Coefficient", "StdError", "t_value", "p-value")
# Write the coefficient table out to Alteryx in the first R tool output
write.Alteryx(coef_table)
# Write out the R-Squared and Adjusted R-Squared values to the second R tool output as a data frame with one row and two columns
write.Alteryx(data.frame(R_Sqr = the_summary$r.squared, Adj_R_Sqr = the_summary$adj.r.squared), 2)
In terms of the Alteryx macro, the original Model Coefficients macro is a good template, with the exception that there will now need to be two macro output tools, one for the coefficient table, the other for the R-squared values.
Dan
hi,
I wonder if you could help me.
I use your example but I am hanging ...
I set up a linear regression tool, connected the R output of the regression tool to the R-icon containing your macro code, but it does not work as it das not find the input #1.
do I miss a step in between?
thanks
best wishes
gerald