I am trying to know the values of Variables in the boosted Variable importance plot but it doesn't give that. How can I add values to Variable Importance plot so i can use them in further calculations?
Solved! Go to Solution.
Hi @amrutas,
The image you've posted looks like it came from a Forest model. Are you trying to extract the variable importance values from a boosted model, random forest, or both? I ask because the process to do so will be slightly different depending on the model.
Either way, at a high level you will want to:
1. Connect an R tool to the O anchor of the tool you used to generate your model. The O stands for Object, and this is where the model object is streamed onto the canvas.
2. Write R code into the tool to read in the model and unserialize the object.
3. Load in the R package that corresponds to your model, this is randomForest for a random forest model and gbm for a boosted model.
4. Apply the corresponding R function to the model to extract the variable importance metric. This is importance() for a random forest and summary() for boosted models.
5. Write the resulting data back out to Alteryx.
The code to do this for a random forest model:
# Read in serialized model object
modelObj <- read.Alteryx("#1", mode="data.frame")
# Unserialize model
model <- unserializeObject(as.character(modelObj$Object[1]))
# load library
library(randomForest)
# extract feature importance
importance <- importance(model)
# convert to data frame
output <- data.frame(var = row.names(importance), importance)
# write out data frame
write.Alteryx(output, 1)
The code to do this for a boosted model:
# Read in serialized model object
modelObj <- read.Alteryx("#1", mode="data.frame")
# Unserialize model
model <- unserializeObject(as.character(modelObj$Object[1]))
# load package
library(gbm)
# extract variable importance
output <- summary(model)
# write out to Alteryx
write.Alteryx(output, 1)
I've attached a workflow that demonstrates the process for both a boosted model and a random forest model. Please be sure to read the documentation on each of the functions used to extract the variable importance metrics (hyperlinked above).
Hope this helps!
Sydney
Hey Sydney, your R code works to extract the values for the variable importance, thank you! My question is related to the other variable importance measure besides Mean Decrease Gini (IncNodePurity), which is Mean Decrease Accuracy (%IncMSE). Would you be able to construct an R code that derives those values for a Forest Model? I've read that the Gini index is biased towards values with numerous split points and wanted to use the %IncMSE to add another layer to my analysis.
*Not sure if you remember we, we met at the conference in Nashville and you helped me with the Importance Weights tool
Hi @h_kee,
Great to hear from you! Were you able to get the Importance Weights tool working on your machine?
Regarding your question, according to the documentation the same function (importance()) should be able to extract both mean decrease in accuracy and well as mean decrease in node impurity. Adding the argument type = 1 will extract mean decrease in accuracy, and type = 2 returns mean decrease in node impurity. Not specifying a type argument (as done in the example code in this thread) returns all importance metrics associated with the model.
The reason only the mean decrease in node impurity is getting returned by the code in this thread is because it is the only importance metric attached to the random forest model object coming out of the Forest Model tool.
I checked code that generates the random forest models in the Forest Model tool, and found that the mean decrease in accuracy metric isn't calculated because the argument importance = TRUE is not set. I modified the Forest Model tool so that the importance = TRUE argument is set, and the code in this thread works as expected for extracting mean decrease accuracy.
I've attached an example workflow with the modified Forest Model tool and working R code. You should be able to use the modified Forest Model tool included in the packaged workflow without any trouble. That being said - because I've changed the code in it, this tool is not officially supported (so don't overwrite your default Forest Model tool with it.)
Thanks!
Sydney
I was not able to get that tool to work but it's fine; you were very helpful though!
I have Alteryx 2019.1, which isn't the latest version, so I'm not able to download the modified Forest Model tool. Not to burden you with more work but if you're able to make it compatible with that version that'll be very helpful, if not, I can see about getting the latest version installed. Thanks again for your help, you are brilliant to say the least!
Hi @h_kee,
I down-versioned the packaged workflow to 2018.4 - you should be able to open it without issue now, but please let me know if anything weird happens 🙂
Hey Sydney,
Your answer also helps me though I have another luxury question for general understanding.
Is it possible to work with Python coding module in order to do what you have written there or does it have to be a R customization module?
Best,
Atamert
Hi @atamertarslan,
All of the tools in the current predictive suite (the "brown tools" as well as the time series tools, AB testing tools, and prescriptive tools) are primarily written in the R programming language. The model object output of a predictive tool is a serialized R object, so if you'd like to work with the model objects generated by those tools, you will need to do so in R.
You can actually see the underlying R code for any of the predictive tools by right-clicking on the predictive tool and selecting the option Open Macro from the drop-down menu. This will open the workflow that executes the logic of the tool. The bulk of most predictive tool's functionality comes from an R tool within the macro.
Does this answer your question? Please let me know if there is anything I can expand on to help your understanding.
Thanks,
- Sydney
Hi Sydney,
Is this possible for SVM? Is this a part of the output?
I haven't been abled to extract the values, but then again R isn't something I'm that familiar with...
Alexander.
Hi @Alexandersd,
The variable importance metric is not a part of the SVM algorithm (it is a metric included by default for random forest and boosted models), but you can use the same general process to extract other values and metrics from the SVM tool. You can reach more about using custom R code to extract additional information from the predictive tools here.
If you are looking to determine variable importance for an SVM model, these resources also might be of interest to you:
https://stats.stackexchange.com/questions/2179/variable-importance-from-svm
Thanks,
Sydney