Hello - looking for some minor help in attached workflow. Essentially I am: * removing some individual store data from my analysis * applying p correlation to find variables that have the most ideal p correlation based on defined threshold * Use those variables in a linear regression model as predictor variables to find target variable (revenue) * Taking above model and scoring for the store I left out from the start * Applying a % difference to the predicted forecasted revenue and the actual revenue The expected output can be seen at the top of the workflow (score 163643.334625 with an error of 0.0964) However my ouput is (forecasted revenue 162031.049182 with an error of 0.105302956229071. Can anyone help me figure out why I am off by ~1% of the expected outcome? Have been staring at this for a few hours and trying to troubleshoot with no success. Thanks! Linear Regression Problem Set.yxmd

Stuck on problem set - Linear Regression scoring with p correlation

Hello - looking for some minor help in attached workflow. Essentially I am:

removing some individual store data from my analysis
applying p correlation to find variables that have the most ideal p correlation based on defined threshold
Use those variables in a linear regression model as predictor variables to find target variable (revenue)
Taking above model and scoring for the store I left out from the start
Applying a % difference to the predicted forecasted revenue and the actual revenue

The expected output can be seen at the top of the workflow (score 163643.334625 with an error of 0.0964)

However my ouput is (forecasted revenue 162031.049182 with an error of 0.105302956229071.

Can anyone help me figure out why I am off by ~1% of the expected outcome? Have been staring at this for a few hours and trying to troubleshoot with no success.

Thanks!

Linear Regression Problem Set.yxmd

Developer

Predictive Analysis

R Tool

Accepted answers

AkimasaKajitani

Hi @JoeMarco ,

I think the reason is that you select the lower correlation parameter.

If you use the Association analysis, its tool will show the appropriate parameter.

We can select the parameter that have *.

I'm going follow this, I check the bellow parameter(Dairy_Shr ~ Floral_Shr).

The result is as follows.

Linear Regression Problem Set_AK.yxmd

All comments

AkimasaKajitani

Hi @JoeMarco ,

I think the reason is that you select the lower correlation parameter.

If you use the Association analysis, its tool will show the appropriate parameter.

We can select the parameter that have *.

I'm going follow this, I check the bellow parameter(Dairy_Shr ~ Floral_Shr).

The result is as follows.

Linear Regression Problem Set_AK.yxmd

JoeMarco

Akimasa thank you so much! This helped me get the answer I was looking for, I was wondering if i could bother you for a few more questions just so I know what went wrong here in my workflow:

Within alteryx in this use case, what is the advantage of using association analysis tool vs the pearson correlation tool? I see they gave different results when I went back to check and was wondering if you could simply explain why the association analysis was the best tool here. Was the Pcorrelation tool i was using before giving me wrong measures and thus i was selecting the wrong variables?
I see what went wrong in my workflow, P correlation tool was giving me lower correlation parameter variables that I was considering for my model input, but thus was not matching the variables to the output in association analysis

Any general guidance you can share on best practices here and why this occured would be great to help me avoid this in the future.

Quick Links

This months top contributors

mceleavey 383

mbarone 337

Hollingsworth 335

LanisC 335

RithiS 280