Hi All,
I have a dataset for which I need to build a Linear Regression Model using Predictive Module in Alteryx.
I hereby attach the dataset for your reference.
Use case :
Predicting Catalog Demand
The task is to predict how much money the company can expect to earn from sending out a catalog to new customers. Need to build a linear regression model and apply the result to the mail-order catalog business problem
The two data files are: p1-customers & p1-mailinglist.
We need to predict the expected profit from the 250 new customers in the mailing list file. Unless we can expect a profit of >$10,000, we will go ahead with printing the catalogs to send to these new customers. So because we are data rich, and trying to solve a numerical continuous problem, we will be using the linear regression model.
It will of great help if some one guides me how to buid the workflow in alteryx and also can some one guide me once we develop this model in alteryx how to test the predictive model with new set of dataset and also how to push this workflow in alteryx server/gallery.
Thanks,
Karthik
Hi... Are you missing your dependent variable or is score_Yes your dependent variable? If so it looks like it's kind of designed for a 1/0 true false. Also in your test data since you are building your model on it - you kind of need to decide if it's a 1 (yes) or a 0 (no)?
this is probably better suited for logistic regression or another yes/no style regression model (tree model?)... as a linear regression it's not giving me much...
Min | 1Q | Median | 3Q | Max |
-0.2375 | -0.1152 | -0.0659 | 0.0579 | 0.6975 |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Significance codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 |
Response: Score_Yes | ||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||
Significance codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 |
Hi @Karthik_7694 ,
Great to see a predictive use case!
I created an example predictive workflow going over data health, picking variables and testing the model on a validation dataset.
I used the customer list data to calculate spend, prepared the data for the predictive model, split the data into training and validation data set and then used a linear regression to predict spend. The score tool in the end uses the model to predict spend for the validation dataset. Here you can compare actual spend with predicted spend.
I didn't get good results (R-squared of ~0.5, meaning only half of the variation is explained by the model), and I think there is more data needed. Looking at the Model results, the only significant factor to predict spend was the customer segment. (using average spend is not best practice, as this includes information about the target spend)
If you have time series data, you could try time series forecasting to predict spend.
Please mark this as the solution if it answers your question, it will help others to find solutions quicker.
Kind Regards,
Kilian
Solutions Engineer - Alteryx