Hi all,
I have an SAS code for at stepwise regression that i want to replicate in Alteryx.
/ slstay=0.01 slentry=0.01 selection=stepwise include=8 ;
The include 8 means that first 8 variables are always inlcuded.
Does anyone know how to do that in Alteryx
Preferebly by altering the existing stepwise regression.
And don't worry that the Alteryx version uses AIC or BIC to make the model, if I can force the 8 first variables in then I'm happy.
Best regards
Lars
Solved! Go to Solution.
Hi Lars,
Not a full answer at this stage, but I believe the syntax you need is described in this StackExchange article: (They call this a semi-constrained stepwise regression)
[Trimmed Quote follows]
"I think you can set up your base model... and then use add1()
with the remaining predictors. So, say you have a model mod1
defined like:
mod1 <- lm(y ~
x1+x2+x3
)
then:
add1(mod1, ~ .+x4+x5+x6, test="F')
will add and test one predictor after the other on top of the base model."
Can't see a way to achieve this directly in the Alteryx tool, so it might require editing of the Stepwise tool macro?
Hope this is a useful pointer, nonetheless?
Nick
Hi Nick,
Thanks it looks like it could help. Of course it would have been nice if you answered "download this tool" or "insert this pece of code..."
Since I'm not that good with R, but i'll give it a try.
It's a little on the site project for now, so it might take a few weeks, but I will get back when I know more.
Best regards
Lars
It is not possible to force variables into the stepwise regression tool as it currently exists, and is something that would require the use of an R tool with custom code. Setting up a user interface that would allow this wasn't possible historically (and the current version of this tool represents that history), but recent changes in these tools may make it possible in the future. Now, why one would want to force variables in a model that don't provide a true improvement in the predictive accuracy of that model is another question.
I found a solution using the R-tool.
Nick you did put me in right direction But I'm not really sure how credit is applied in here...
Why force in variables?
Maybe they are important to client or maybe it just doesn't make sense if they are not in the model, something like predicting number of babies without storks in the equation. If so should a test statistics with an arbitrary chosen p-value of 0.05 or some other test statistic leave out that most important variable? Just because real life interferes with statistical noise in data.
In the below example the log price is estimated by time t and two important variables important1 and important2 and a lot of other variables named x1 through xn
library(MASS)
#load in data, copy data and pull out only the valid data that should be entered in model. Later I predict value on entire dataset
myData <- read.Alteryx("#1",mode="data.frame")
myNewData <- myData
myNewData <- subset(myNewData, IsValid=='1')
#Not really sure if base model is necessary… in ‘upper’ you put in all the variables you want to test. In the ‘lower’ you put in those you want in for sure Notice K=5 (compared to standard K=2) lowering the p-value of accepted variables (except of course for variables in lower model)
MyBaseModel <- lm(logPrice~ t + important1 + important2 + x1 +x2+…+xn, data= myNewData)
MyStepModel <- stepAIC(MyBaseModel, scope=list(upper = ~ t + important1 + important2 + x1 +x2+…+xn,
lower = ~ t + important1 + important2), k=5, trace= FALSE, direction ="both")
#This makes a table for entered variables in the model with name, estimate, p-value etc.
MySummary <- summary(MyStepModel)
MySummary <- MySummary$coefficients
MySummary <- as.data.frame(MySummary)
#Store predicted value to dataset assigned by the $Varname notion and reduce data set to relevant variables the record id, the actual price and the predicted price called fit. Notice it was logprice that was estimated, hence the exp() of the predicted values
myData$fit <- round(exp(predict(MyStepModel, myData)),0)
myData <- myData [c("Id", "Price", "fit")]
#Write the data with the predicted value to line 1 and the parameter estimates to 2
write.Alteryx(myNewData,1)
write.Alteryx(MySummary,2)
That definately works. I've been thinking about this a bit more, and there may be a way to alter the existing stepwise tool to allow for this. It is something we will look into for a future release.
Having read this post, I want to make sure I understood correctly. It looks like the stepwise tool was replaced with the custom code in the R tool as it is not possible to force variables within the stepwise tool.