This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Please be patient with me as I am somewhat new here- but thank you in advance for any help!
In this case I am using a decision tree, but I have tested a few different algorithms and had the same problem; The model is only scoring a small % of the records it is given and the rest are null.
I figure it could have to do with missing values that are integral to the model, but I summarized both the records containing scores and those that don't below and there are definitely some fields like Age that are much more represented in the records that were scored- but there are many that are fairly equivalent. (this summary is attached)
If this is the problem, is there a hyper parameter I can change that would lessen the need for all of the data?
Is it possible that this is not the problem? What else could it be?
Thanks so much for the quick response. I have attached the scored accounts (omitted only two columns with sensitive data- but where 100% of the values were available).
Just to confirm, since the model "works" in the sense that it is assigning scores to some accounts (That are ~84% accurate), the missing values we are both referring to are in the hold-outs that are being scored, correct?
Is there a "best practice" if I were to impute, or assign a values to the nulls on the records being scored? Is there a standard tool/method for this?
What I would look at is take one row of a record that has not scored.
Confirm that for the combination of dimensions you have in the unscored record, that there is a record with the same dimensions going into the model build. If there is, identify how many records there are, it may be plausible that the sample is too low or non existent.