community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
SOLVED

Boosted Model and Score Tool - How to deal with NULL/NA

Asteroid

Hi,

 

Recently I am using a dataset (dataset 1)and split it into training  and validation dataset  to train the boosted model.

After I get the trained boosted model.  I use Score tool to score a new dataset  (dataset2) with the trained boosted model.

 

In the dataset2, parts of variables do not have value. (Blank/NA), but the scoring tool still gets the score for every row.

So I would like to know how the boosted model and score tool to deal with the missing value and get the score.

 

As you know, if we use logistic regression, if one of the variables is empty, then the scoring tool can not get the score for this row.

 

Thank you.

ACE Emeritus
ACE Emeritus

I'm not an expert, but I think this is a feature of those models in R moreso than the Alteryximplementation thereof.  I've attached a workflow that I used to play around with it a bit; it uses the Kaggle Titanic data (since it's small and fits the bill in terms of generating NULL predictions).  In it's current state, everything is cleaned up so that missing values are either imputed or excluded as features of the model.

 

In particular, I saved a copy of the Score tool (which is just a macro - you can right-click it to look at it and see the R code), and commented out several lines of R where they explicitly generate log messages if/when NA values are removed.  When scoring with either macro, it still always came out exactly the same, which, again, leads me to think it's more to do with R than Alteryx.  I also Googled it just a bit in hopes of finding a definitive statement on the matter, but nothing jumped out immediately from that brief effort.

 

Anyway, hope that helps at least a little.  Aside: it also helps to enable logging and look at them closely.

 

Alteryx
Alteryx

After looking at the Boosted Model macro, it seems JohnJPS is correct; the NA handling is done in R.

 

The package used in the boosted model tool's R package is 'gbm', and thus, this seems highly relevant:

http://stackoverflow.com/questions/14718648/r-gbm-handling-of-missing-values

 

Essentially, GBM brings these values into a separate node for each level of the tree. The scores are the same as the scores before that tree split.

Alteryx Alumni (Retired)

Hi @qqqwww,

 

I was actually about to respond with essentially the same answer as @DylanB, but he beat me to the punch! It's also good to know that the Boosted Model tool isn't the only Predictive tool that can handle missing data. The Decision Tree and Naive Bayes classifier also have built-in R procedures to handle missing data.

 

Best,
Bridget

Bridget Toomey

Research Scientist, Analytic Products

Alteryx
Asteroid

Thank you, John. Your workflow is good.

Highlighted
Asteroid

Thank you, everyone.  I learn some new today.

Labels