Hi community,
I have some store data and would like to produce a predictive model to score Revenue forecasting hence I can perform a percent error calculation based on the hold out (actual) and forecast.
What I have done is following the requirement guide:
1. use select tool to split hold out data from the data set
2. run pearson correlation with the filtered dataset in #1 to find out the required p value variables from requirement
3. perform linear regression based on the variables found in #2.
4. Score the hold out store data based on the model in #3.
Up until this point, my predicted data has no change to the hold out(actual) data which is also far from the expected output. Hence I have no way to go the further step for percent error calculation. Attached the workflow below. Can anyone please shed some light?
BTW: I am not quite sure in one of the requirement, Find all variables that have a significant pearson correlation (p < .1) to Revenue. From my understanding of pearson correlation, for the value of p closer to 1 means more significant the value is. What would be the possible considerations why we need the variables has p < .1 in data analytics?