Hi all,
Here's some background detail to kick things off:
I am faced with the following error: "Error in FileYXDBStreaming:: :Read - Unexpected number of bytes" when Alteryx tries to process the .yxdb file (mentioned above) in my workflow.
Is there a maximum file size my local machine can handle? If so, is it based on RAM (I have 16gb)? I am keen to hear any suggestions on how to resolve this issue (preferring not to rerun Part 1 of the background given above, as this was a very time-consuming process).
Thanks for any advice you can provide.
Kind regards,
Ben
That's quite the Forest you have there. I've never seen a model object that large before, but yxdbs can be huge (I wrote a 160GB yxdb yesterday).
Just to make sure, you saved off the model object from the Forest Model tool ("O" output achor) as a 843MG yxdb file. you then Input that yxdb and connected it to the "M" connection on the Score tool and the data stream in the "D" input on the Score tool? And you verified the data going into the Score tool has the same field schema as the training set?
As the next troubleshooting step, I would try would be to verify there's not an issues with the saved model object or data stream differences. Instead of saving the model and scoring in a different workflow, test to see if you can Score the Forest Model right after training it (yes that means re-running it). So in the same workflow where you train the Forest model, add the Score tool and connect the model object directly, also use the same data input connection that goes into the Forest Model tool to feed the Score tool. If this works, then the next step I would take is to verify the data streams are the same in both applications.
Like this:
Now that being said, maybe you could start out with a limited sample set to make sure things are running as expected. This way you could avoid the 55 hour timeline while testing.
Hi Charlie,
Really appreciate the detailed response.
The first part of your confirmation statement is true, however, I input the yxdb file into the "D" input (within my larger flow), with my data to be score in the "M" input - should this be the other way around? The way you've structured your example workflow would indicate I have things connected the wrong way.. Lastly, I can confirm both the schema for both datasets is identical, so we shouldn't have any problems there.
Thanks again.
Kind regards,
Ben