Missed the Q4 Fall Release Product Update? Watch the on-demand webinar for more info on the latest in Designer 24.2, Auto Insights Magic Reports, and more!
Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Prediction modeling

aubh
5 - Atom

Hi community, 

 

I have some store data and would like to produce a predictive model to score Revenue forecasting hence I can perform a percent error calculation based on the hold out (actual) and forecast. 

 

What I have done is following the requirement guide: 

1. use select tool to split hold out data from the data set

2. run pearson correlation with the filtered dataset in #1 to find out the required p value variables from requirement

3. perform linear regression based on the variables found in #2.

4. Score the hold out store data based on the model in #3. 

 

Up until this point, my predicted data has no change to the hold out(actual) data which is also far from the expected output.  Hence I have no way to go the further step for percent error calculation. Attached the workflow below. Can anyone please shed some light?

 

BTW: I am not quite sure in one of the requirement, Find all variables that have a significant pearson correlation (p < .1) to Revenue. From my understanding of pearson correlation, for the value of p closer to 1 means more significant the value is. What would  be the possible considerations why we need the variables has p < .1 in data analytics? 

 

 

4 REPLIES 4
FrederikE
13 - Pulsar

Hey @aubh,

 

See my attached Workflow. 

 

You fed the Pearson correlations into the regression tool. They are just used to identify which varriable to use, the Regression Tool still needs the original data.

 

Hope this helps 😉

 

 

aubh
5 - Atom

Hi FrederikE, 

 

Thanks for the reply. I shouldn't output it directly to the linear regression tool. 

 

Another question. Does the Pearson Correlation tool provide p-value output or correlation coefficient output? From my understanding, it provides correlation coefficient output because p-value is always [0,1] while correlation coefficient  is [-1,1]. I've also found the following topic (https://community.alteryx.com/t5/Alteryx-Designer-Knowledge-Base/How-To-Complete-Data-Preparation-An... p-value comes from Association Analysis Tool. But the output of Association Analysis Tool is a report and I am not sure if there is a way that I can use filter and use those variables directly. 

FrederikE
13 - Pulsar

True, in our workflows the Pearson Correlation is used, which has to be high for a correlation to be meaningful. 

You might want to use the "Association Analysis" Tool to determine the corresponding p-Values.

 

I am not sure if I understand what you are trying to do, since the input into the Linear Regression Tool should always be the original data and not the correlation/p-values.  

As you can see from my approach this also leads to a reasonable value (10% error), although the wrong variables might have been chosen. 

aubh
5 - Atom

Yep, totally agree with you about the input of linear regression. I made a mistake in the very beginning by using the output from Pearson Tool. 

I think I've just solved it. Attached my updated workflow.

 

However, there is one thing I am still not sure about, the Association Analysis Tool can only provide Browser Tool output into a report format. Just wondering if there is a way I can output the Association Analysis result, filter the variables I need, just similar to the way how you previously output from the Pearson Tool? 

aubh_0-1652538335126.png

 

 

 

 

Labels
Top Solution Authors