Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Linear Regression Output Discussion: Multiple R-square or R-square

MangoBro
7 - Meteor

Hi All,

 

Recently I was playing with the Regression tool and a little confused when interpreting the outcomes, especially R-square and its variations.

 

E.g., for the same data points (See attached). Alteryx and Excel posted slightly different outcomes:

 

Alteryx posted both Multiple R-square and Adj R-square while Excel got both in addition to R-square. However, it seems

                     Multiple R-sq (Alteryx) = R-square (Excel)

Question:

1. Does my the equation above correct because the value of Multiple R-sq should be R-sqaure ^0.5

2. Is it possible to change the specific number of decimal points in Alteryx. E.g., currently 0.992 for Multiple R-sq, where (probably the macro behind) can we increase the decimal number to 0.9920 if possible.

 

Alteryx

Alteryx-R-sq.PNG

Excel

Excel-R-sq.PNG

 

Thanks in advance!

 

4 REPLIES 4
tcroberts
12 - Quasar

Multiple R-Squared is simply a standard R-Squared value for models with more than one "x", or predictor variable. This means that any R-Squared value when you use multiple predictors is technically Multiple R-Squared. this means that your equation above the question is correct, Multiple R-Squared in Alteryx should be the same as the R-Squared value you're getting from Excel.

 

Adjusted R-Squared is an alternate metric which is used when you want to make comparisons between models that have different number of predictors. Due to the way that standard R-Squared is defined, adding a predictor variable will *always* increase its value, even when there is no predictive power in the added variable. As a result, Adjusted R-Squared includes a penalty term for additional variables, making it so that in order for your model to improve, the increase in predictive power needs to be enough to offset an additional penalty from adding the variable. Here's a StackExchange post with a better explanation, and I'm happy to clear up any confusion about this.

 

As for your second question, it does seem possible, although I've never looked too closely into the Linear Regression macro, so I'm not 100% sure how to go about this.

 

At first glance, I've pinpointed R Tool (170) in the macro as the one where the interactive report going to the I output is generated. There is a loop where a variable called "dashboard" is set, being created from the data and model. I believe that this here would be what you'd want to look at to increase the number of decimals.

 

If you're having trouble changing the macro, I'd also recommend just trying it out in the R tool yourself, as the `lm` package is rather easy to use and extract details from.

 

You can write something as simple as:

 

dat <- read.Alteryx('#1')

model <- lm(y~x1+x2+x3, data=dat)

summary(model)

to fit a linear model to your data.

 

You should look into the broom package (here) to tidy the goodness of fit metrics into a nice format for you to read back into Alteryx

 

Let me know if this helps, or you have any other questions.

 

Cheers!

MangoBro
7 - Meteor

Awesome! Really appreciate your help on this!

MangoBro
7 - Meteor

Hey,

 

Hope you had a great weekend.

 

Same regression output, different stat. It seems for highly significant p-value, Alteryx gives an in-equation instead of a precise number. I know it's smaller enough for decision-making. But is there a reason why since it seems not a big effort to get there. Also see Excel screenshot for comparison below. Thanks.

 

Alteryx

Alteryx_P-value.PNG

Excel

Excel_P-value.PNG

 

tcroberts
12 - Quasar

This is something that's coming through from R actually. It's pretty much just saying that the p-value is very close to 0. Since it's not possible to exactly test equality of floating point numbers (due to some weird representation issues), it is sufficient to say something is "close enough" to another value. We usually use some threshold to determine what is close enough. In this case, that line is just saying that p is within 0.000000000000000022 of 0, which is sufficient for it to call it 0. However, it provides you the information about the threshold so you can make informed decisions about if its precise enough.

 

I believe that if you use the R tool to run that regression and take a look at the output, your p-value stored in the regression object will actually have full precision (double precision, ranging from 2e-308 to 2e+308), and if you require further granularity you can extract it from there.

 

Let me know if this helps,

 

Cheers!

Labels