We are celebrating the 10-year anniversary of the Alteryx Community! Learn more and join in on the fun here.
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Predictive Analytics-Naive Bayes-Error "apply(log(vapply(seq_along(attribs)"

goutdelete
8 - Asteroid

Hi..

 

I am trying to use the Naive Bayes Classifier tool for a various classification analysis (let's call it good customer/ bad customer) but I keep getting different error messages.  At first it's data out of bounds for scoring, then it's this "Naive Bayes Classification: Error in apply(log(vapply(seq_along(attribs), function(v) { :"

 

 
 

Picture1 - Copie.png

 

When I did some simple Titanic testing all seems to be fine though:

Picture2 - Copie.png

 

I tried to reduce the attributes either number or text (categorical) but the error message is the same.

Anybody might know what may have been the reason for error messages?

 

Thanks much!

6 REPLIES 6
acarter881
12 - Quasar

Hello, @goutdelete.

 

I think it will be easier to debug if we're given a sample workflow. It's tough to say, as I don't see those keywords used in the code within the R tool that's used within the Naive Bayes tool.

goutdelete
8 - Asteroid

Thanks very much @acarter881 for the reply! Let me prep the data and maybe trim some elements so I wouldn't be accidentally put our customer data out there.

 

By the way there is one post at Stackoverflow with the exact message although I wasn't too sure if it's applicable with my case:

r - Error in apply(log(sapply(seq_along(attribs), function(v) { : dim(X) must have a positive length...

acarter881
12 - Quasar

You're welcome, @goutdelete. I saw that as well (it isn't a complete match to your error, but it is close), but I'm sure it's something I should be able to fix if I had the workflow. 🙂

goutdelete
8 - Asteroid

@acarter881  so upon further investigation with the data prep I realized I had one step to cleanse the null value and forgot to update the logic (from isnull() then bad to != 0 then bad).  So in the end it accidentally became 1 result only: all good.  I suppose it is indeed the same problem like the stackoverflow thread with 1 dim only; Bayes calls for two results to perform the analysis.  See the two pics below:

Picture1 - Copie.pngPicture2 - Copie.png

Nonetheless after correction I seem to have new problem; see confusion matrix only shows half and I'm pretty sure it's wrong. 

 

Picture4 - Copie.pngPicture3 - Copie.png

 

I attached both in the sample workflow below. 

 

Thanks!

acarter881
12 - Quasar

Hello, @goutdelete.

 

I believe these are the only tools you need in your top-most example; the other tools have no effect on the data (see first screenshot).

 

I think you may want to look into oversampling; it's likely that 23 records of Bad isn't enough. Even if you split that 50/50 with oversampling Bad, then you would only have 46 records in total going into the model, which is likely not going to produce useful results. The Naive Bayes macro uses 500 records (see second screenshot), with a split of 252 Yes and 248 No. This is likely the type of split you want (i.e., 50/50) and shows how you need more records going into the model.

image.png

 

image.png

goutdelete
8 - Asteroid

Hi @acarter881  Thanks for the input, let me look into it.

 

However I don't think reducing other tools would make an impact; selection tool is really just from my original dataset since I have quite a lot more fields, imputation tool on the other hand is necessary, it's the only way that I could think of so that I don't need to go to python route to replace NA with anything such as avg value.

 

My original data (still a subset only) has over 2000 records so I feel it should be sufficient.  23 might be a smaller number because I used random tool to trim the size of the sample.  However it is indeed disproportionate since it's a good/ bad customer type of analyses.  Any business would have a big trouble if it's 50/50. :)

 

On the other hand, I do have some other new error and error message today; if I change the name of the result.  Confusion matrix table showed up (but still wrong).  And the score error message actually rolled back to what I got before as some "subscript out of bounds"..

 

Picture1 - Copie.pngPicture4 - Copie.png

In this original subset, it would be 143 Yes and 2014 No.  So the table was still wrong unfortunately.

Labels
Top Solution Authors