Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Pulling data from Linear Regression report output

mdoo
5 - Atom
I'm using the Linear Regression tool and would like to continue to use some of it's output. Specifically, fields from the coefficients table such as intercept. Is there a way to do this?
16 REPLIES 16
rahuls
9 - Comet
Hi

As per your query, what you can do is:
1. Open the Linear Regression Tool and Goto R tool.
2. From the first output (1) of R tool, you can parse the data (using Text to Columns Tool) and get the required output you would like to continue with.
3. For furthur information, you can check the following snapshot of customized Linear Regression. The highlighted area is the area of your interest.

.

Hope this will help you.
FDELABAL
5 - Atom

Hi, 

 

I'm confused as to how this works specifically. What are you parsing by when you add the text/column function?? The report that comes out from that output is not broken down in a way where you can easily parse out what variable is tied to what coefficient, etc. 

 

This seems like a pretty basic functionality that's missing in Alteryx. You would think it's obvious that we want an easy way to use the regression output for further calculations (e.g., predicting what model says would be variable given x, y, z, etc.) Am I missing something? 

 

 

NeilR
Alteryx Alumni (Retired)

Here's a macro that will parse out a table from a report that comes out of a predictive tool. It's called Get Table From Report and it's embedded in an example to demonstrate its usage. Hope this helps...

chris_love
12 - Quasar
Hold on guys. There's a simple macro you can use without parsing the reports. It uses R code to flesh out the R model from the model output of any of the predictive tools. Its written to address exactly the issue the OP raised.

I'm not at my PC right now but I'll post it first thing (its v late here in UK).
chris_love
12 - Quasar

I have this macro in my old email address which has gone, but the solution is akin to this one: 

 

http://community.alteryx.com/t5/Advanced-Analytics/Spline-Tool-Output/td-p/325

 

Perhaps @DrDan can post the relevant R code we can turn into a macro as this link doesn't work for Linear Regression.

 

Chris

DrDan
Alteryx Alumni (Retired)

Attached is the macro to which Chris is referring. The input to the macro is an Alteryx data stream containing a model created by one of the standard Alteryx predictive modeling tools (e.g., Linear Regression) and the output is a table that consists of the names of the model coefficients along with the corresponding coefficient values. In addition, below is the R code within the macro (with added comments):

 

# Read in the Alteryx stream containing the model object

the.model <- read.Alteryx("#1")

 

# Make sure that it is in fact a model object, and return an error if it is not
if (!all(names(the.model) %in% c("Name", "Object")))
    stop.Alteryx("A model object was not provided")

 

# The model object has been serialized as a string, which on being read from Alteryx is

# converted to an R factor. The line of code first coerces the serialized model string to

# a string (in R a "character" data type) and the unserialized the model object
mod.obj <- unserializeObject(as.character(the.model$Object[1]))

 

# From the first test error check we know the Alteryx stream contained a model, but not

# all models have coefficients, so test to see if there are model coefficients, and throw an

# error otherwise
if (!is.null(mod.obj$coefficients)) {
    the.coefs <- mod.obj$coefficients
    write.Alteryx(data.frame(Variable = names(the.coefs), Coefficient = the.coefs))
} else {
    stop.Alteryx(paste("A model of class", class(mod.obj)[1], "does not have coefficients"))
}

 

This code and the corresponding macro will work for the Linear, Logistic, Count, and Gamma Regression tools. However, if the objective is to calculate predicted values from the model, then extracting the coefficients and using them in a formula tool (or in some other tool) is strongly discouraged. The reasons for this are: (1) the process of translating a set of model coefficients to a formula is often prone to serious error (particularly when categorical variables are used) and (2) doing this is likely to be a waste of time on the part of the user since the Score tool will acurately calculate fitted values at the very small cost of dragging one additional tool onto the canvas.

 

It turns out you can expose anything related to an R model object to Alteryx using an R tool, whether that is needed, or is even a good choice, is another matter.

jaituteja
5 - Atom

Hello DrDan,

 

Thanks for the macro. it is very helpful.

 

Is there any way to fetch the values of "R square", "Adjusted R square" and the "p-values of the corresponding coefficients" as well in the macro you have provided. I tried, however no luck.

 

Request you to kindly help for the same.

 

 

Thanks,

Jai

DrDan
Alteryx Alumni (Retired)

@jaituteja, sorry for a somewhat slow response. However, I do have a fairly complete one.

 

The issue you are likely running into is that the quantities you want are not stored in the model object, they are calculated by the summary methods for the linear regression model object. It turns out that the summary method also produces an object (it turns out everything in R is an object). One trick is R-squared and adjusted R-squared are scalars, while the p-values are a vector, so getting them output to Alteryx properly is a bit tricky. In the code below, I just don't extract the p-values, but the full coefficient summary table and write it to one Alteryx table (you can select only the p-values using a select tool in Alteryx), and then I get the R-squared and adjusted R-Squared values and write them to another table. Here is the R code that goes into an R tool:

 

# Read in the data stream containing the R lm model object and unserialize the object
mod_df <- read.Alteryx("#1")
mod_obj <- unserializeObject(as.character(mod_df$Object[1]))

# Get the summary object for the model, all the statistics of interest are in the summary
the_summary <- summary(mod_obj)

# Prepare the full coefficient results table, which includes the p-values
coef_table <- as.data.frame(the_summary$coefficients)
# Add the coefficient names to the table
coef_table <- cbind(data.frame(var = row.names(coef_table)), coef_table)
# Provide R friendly names to the table fields
names(coef_table) <- c("Variable", "Coefficient", "StdError", "t_value", "p-value")
# Write the coefficient table out to Alteryx in the first R tool output
write.Alteryx(coef_table)

# Write out the R-Squared and Adjusted R-Squared values to the second R tool output as a data frame with one row and two columns
write.Alteryx(data.frame(R_Sqr = the_summary$r.squared, Adj_R_Sqr = the_summary$adj.r.squared), 2)

 

 

 

In terms of the Alteryx macro, the original Model Coefficients macro is a good template, with the exception that there will now need to be two macro output tools, one for the coefficient table, the other for the R-squared values.

 

Dan

gbpwc
5 - Atom

hi,

I wonder if you could help me.

I use your example but I am hanging ...

I set up a linear regression tool, connected the R output of the regression tool to the R-icon containing your macro code, but it does not work as it das not find the input #1.

do I miss a step in between?

thanks

best wishes

gerald

 

Labels