Alteryx Designer Desktop Discussions

JoeMarco · ‎10-07-2021

Hello - looking for some minor help in attached workflow. Essentially I am:

removing some individual store data from my analysis
applying p correlation to find variables that have the most ideal p correlation based on defined threshold
Use those variables in a linear regression model as predictor variables to find target variable (revenue)
Taking above model and scoring for the store I left out from the start
Applying a % difference to the predicted forecasted revenue and the actual revenue

The expected output can be seen at the top of the workflow (score 163643.334625 with an error of 0.0964)

However my ouput is (forecasted revenue 162031.049182 with an error of 0.105302956229071.

Can anyone help me figure out why I am off by ~1% of the expected outcome? Have been staring at this for a few hours and trying to troubleshoot with no success.

Thanks!

AkimasaKajitani · ‎10-07-2021

Hi @JoeMarco ,

I think the reason is that you select the lower correlation parameter.

If you use the Association analysis, its tool will show the appropriate parameter.

We can select the parameter that have *.

I'm going follow this, I check the bellow parameter(Dairy_Shr ~ Floral_Shr).

The result is as follows.

JoeMarco · ‎10-07-2021

Akimasa thank you so much! This helped me get the answer I was looking for, I was wondering if i could bother you for a few more questions just so I know what went wrong here in my workflow:

Within alteryx in this use case, what is the advantage of using association analysis tool vs the pearson correlation tool? I see they gave different results when I went back to check and was wondering if you could simply explain why the association analysis was the best tool here. Was the Pcorrelation tool i was using before giving me wrong measures and thus i was selecting the wrong variables?
I see what went wrong in my workflow, P correlation tool was giving me lower correlation parameter variables that I was considering for my model input, but thus was not matching the variables to the output in association analysis

Any general guidance you can share on best practices here and why this occured would be great to help me avoid this in the future.

AkimasaKajitani · ‎10-08-2021

Hi @JoeMarco ,

Honestly the both of the tools output the same result. But the difference is how to output.

The advantage of using association analysis tool is easy to understand.

Because we can set the predictive field at the Association analysis tool and its tool outputs the correlation between the target variable and predictor variables directly.

If you use the P correlation tool, you should get the correlations with large absolute values.

Alteryx Designer Desktop Discussions

Stuck on problem set - Linear Regression scoring with p correlation

Re: Row creation

Re: How to select columns dynamically using number...

Re: Batch macro to read 1000+ .xlsx files with var...

Re: Issue when using Block Until Done and Power BI...

Example workflow for setting up a custom list to u...