Stuck on problem set - Linear Regression scoring with p correlation
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hello - looking for some minor help in attached workflow. Essentially I am:
- removing some individual store data from my analysis
- applying p correlation to find variables that have the most ideal p correlation based on defined threshold
- Use those variables in a linear regression model as predictor variables to find target variable (revenue)
- Taking above model and scoring for the store I left out from the start
- Applying a % difference to the predicted forecasted revenue and the actual revenue
The expected output can be seen at the top of the workflow (score 163643.334625 with an error of 0.0964)
However my ouput is (forecasted revenue 162031.049182 with an error of 0.105302956229071.
Can anyone help me figure out why I am off by ~1% of the expected outcome? Have been staring at this for a few hours and trying to troubleshoot with no success.
Thanks!
Solved! Go to Solution.
- Labels:
- Developer
- Predictive Analysis
- R Tool
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @JoeMarco ,
I think the reason is that you select the lower correlation parameter.
If you use the Association analysis, its tool will show the appropriate parameter.
We can select the parameter that have *.
I'm going follow this, I check the bellow parameter(Dairy_Shr ~ Floral_Shr).
The result is as follows.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Akimasa thank you so much! This helped me get the answer I was looking for, I was wondering if i could bother you for a few more questions just so I know what went wrong here in my workflow:
- Within alteryx in this use case, what is the advantage of using association analysis tool vs the pearson correlation tool? I see they gave different results when I went back to check and was wondering if you could simply explain why the association analysis was the best tool here. Was the Pcorrelation tool i was using before giving me wrong measures and thus i was selecting the wrong variables?
- I see what went wrong in my workflow, P correlation tool was giving me lower correlation parameter variables that I was considering for my model input, but thus was not matching the variables to the output in association analysis
Any general guidance you can share on best practices here and why this occured would be great to help me avoid this in the future.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @JoeMarco ,
Honestly the both of the tools output the same result. But the difference is how to output.
The advantage of using association analysis tool is easy to understand.
Because we can set the predictive field at the Association analysis tool and its tool outputs the correlation between the target variable and predictor variables directly.
If you use the P correlation tool, you should get the correlations with large absolute values.
