This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
This is hard limit in the naiveBayes function for the e1071 R package that is used to implement the model. A similar hard limit on the number of categories for variables exists in the Forest Model tool as a result of the underlying R package. The reason for this is that the combinatorics involved for more levels in these algorithms gets out of hand if there are levels involved. In addition, often when there are a lot of levels for a categorical variable, many of those levels have a small number of counts, and become unreliable predictors. My advice is to consolidate the number of categories to a smaller number, making sure that there are a reasonable number of counts in each category.
Alteryx ACE & Top Community Contributor
Chaos reigns within. Repent, reflect and reboot. Order shall return.