Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Allow higher number of distinct values for categorical predictors in classification

apimentel
5 - Atom

Hello,

 

I am using a Naive Bayes Classifier in Alteryx and found that I could not used categorical fields as predictors with more than 50 distinct variables. I have this error: "ngrid1:50 is less than the number of levels in Test". Where Test is my categorical field predictor.

Is it possible to force this value to be higher?  If so, could you indicate the steps to do this?

I would also like to mention that this error also happens with the random forest.

 

Thank you,

 

 

1 REPLY 1
DrDan
Alteryx Alumni (Retired)

@apimentel, this is hard limit in the naiveBayes function for the e1071 R package that is used to implement the model. A similar hard limit on the number of categories for variables exists in the Forest Model tool as a result of the underlying R package. The reason for this is that the combinatorics involved for more levels in these algorithms gets out of hand if there are levels involved. In addition, often when there are a lot of levels for a categorical variable, many of those levels have a small number of counts, and become unreliable predictors. My advice is to consolidate the number of categories to a smaller number, making sure that there are a reasonable number of counts in each category.

Labels