Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Alteryx Workflow - Predicitve Model

Karthik_7694
8 - Asteroid

Hi All,

 

I have a dataset for which I need to build a Linear Regression Model using Predictive Module in Alteryx.

 

I hereby attach the dataset for your reference. 

 

Use case :

Predicting Catalog Demand

 

The task is to predict how much money the company can expect to earn from sending out a catalog to new customers. Need to build a linear regression model and apply the result to the mail-order catalog business problem

 

The two data files are: p1-customers & p1-mailinglist.

We need to predict the expected profit from the 250 new customers in the mailing list file. Unless we can expect a profit of >$10,000, we will go ahead with printing the catalogs to send to these new customers. So because we are data rich, and trying to solve a numerical continuous problem, we will be using the linear regression model.

 

It will of great help if some one guides me how to buid the workflow in alteryx and also can some one guide me once we develop this model in alteryx how to test the predictive model with new set of dataset and also how to push this workflow in alteryx server/gallery.

 

 

Thanks,

Karthik

 

2 REPLIES 2
apathetichell
18 - Pollux

Hi... Are you missing your dependent variable or is score_Yes your dependent variable? If so it looks like it's kind of designed for a 1/0 true false. Also in your test data since you are building your model on it - you kind of need to decide if it's a 1 (yes) or a 0 (no)?

 

this is probably better suited for logistic regression or another yes/no style regression model (tree model?)... as a linear regression it's not giving me much...

 

Report for Linear Model Linear_Regression_4
 
2
 
Basic Summary
3
 
Call:
lm(formula = Score_Yes ~ Customer_Segment + City + Store_Number + Avg_Num_Products_Purchased + X._Years_as_Customer, data = the.data)
 
4
 
Residuals:
 
5
 
Min1QMedian3QMax
-0.2375-0.1152-0.06590.05790.6975
 
6
 
 Coefficients:
 
7
 
 EstimateStd. Errort valuePr(>|t|) 
(Intercept)0.39776480.590800.673260.50147 
Customer_SegmentLoyalty Club Only0.00769100.030210.254600.79927 
Customer_SegmentLoyalty Club and Credit Card-0.00837860.04679-0.179080.85804 
Customer_SegmentStore Mailing List0.02800170.053080.527520.59836 
CityAurora-0.03394670.04762-0.712830.47669 
CityBroomfield0.03925570.065910.595620.55203 
CityCastle Pines-0.17189960.19680-0.873490.38333 
CityCentennial-0.02846720.07637-0.372760.70968 
CityCommerce City-0.11300520.14069-0.803210.4227 
CityDenver-0.03200470.04700-0.680900.49664 
CityEdgewater0.64827060.195973.307930.00109**
CityEnglewood0.03079230.105220.292660.77005 
CityGolden-0.09629320.14104-0.682740.49548 
CityGreenwood Village-0.08568670.19848-0.431720.66636 
CityHighlands Ranch-0.06059540.14340-0.422550.67303 
CityLakewood0.02282370.053670.425240.67107 
CityLittleton0.01362950.078040.174640.86152 
CityLouisville0.01658660.196560.084380.93283 
CityNorthglenn0.10450960.118090.885010.3771 
CityParker0.05061210.106340.475930.63458 
CityThornton0.04856680.117720.412560.68033 
CityWestminster0.00805560.088950.090560.92792 
CityWheat Ridge0.02257800.094420.239130.81122 
Store_Number-0.00020430.00551-0.037070.97046 
Avg_Num_Products_Purchased0.00094070.005610.167680.86698 
X._Years_as_Customer-0.06388040.04095-1.559940.12019 
Significance codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
 
8
 
Residual standard error: 0.19097 on 224 degrees of freedom
Multiple R-squared: 0.09166, Adjusted R-Squared: -0.009722
F-statistic: 0.9041 on 25 and 224 degrees of freedom (DF), p-value 0.5998
 
9
 
Type II ANOVA Analysis
 
10
 
Response: Score_Yes
 Sum SqDFF valuePr(>F) 
Customer_Segment0.0130.110.95421 
City0.691910.46678 
Store_Number0100.97046 
Avg_Num_Products_Purchased010.030.86698 
X._Years_as_Customer0.0912.430.12019 
Residuals8.17224   
Significance codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
 
KilianL
Alteryx Alumni (Retired)

Hi @Karthik_7694 ,

 

Great to see a predictive use case!

 

I created an example predictive workflow going over data health, picking variables and testing the model on a validation dataset.

 

I used the customer list data to calculate spend, prepared the data for the predictive model, split the data into training and validation data set and then used a linear regression to predict spend. The score tool in the end uses the model to predict spend for the validation dataset. Here you can compare actual spend with predicted spend.

 

I didn't get good results (R-squared of ~0.5, meaning only half of the variation is explained by the model), and I think there is more data needed. Looking at the Model results, the only significant factor to predict spend was the customer segment. (using average spend is not best practice, as this includes information about the target spend)

If you have time series data, you could try time series forecasting to predict spend.

 

Karthik - predictive.jpg

 

 

Please mark this as the solution if it answers your question, it will help others to find solutions quicker.


Kind Regards,
Kilian
Solutions Engineer - Alteryx

Labels