This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Hi, I am trying out the Intelligence Suite's Machine Learning Tools and have run into an error message, but I don't know how to fix it!
The assisted model determined the 3 classification model outcomes of which Random Forest was the best and didn't appear to have any issues, but when I run it against the original dataset as a test using the Predict Tool, I get the following error message.
"Error: Predict: Scoring the model failed with error: Input contains NaNThis is likely due to unhandled null values. Use a Transformation tool with 'Missing value imputation' transformer to remove null values."
One or more of your string/categorical fields contains null values. I would suggest using a formula tool to replace these, then allow the assisted modelling to one-hot encode the categorical variables.
That should fix the problem.
Remember, you need to ensure that the same prep is done to both the records feeding the model, and the records you are going to predict, otherwise you'll have a variable mismatch.
@Stormphoenix, @mceleavey, @PeterAP A few of us have seen this error pop up in the Intelligence Suite starter kit example "Validate Invoices at Risk." It appears that starting with 2022.1, the Image Template or Image to Text tool is picking up some alpha characters in the "Supplier" field from the input PDFs. The rest of the flow, including the assisted Modeling tools, is expecting the Supplier field to be numeric.
Quick fix to get the flow working - filter out the records that have alpha characters in the "Supplier" field and change the field type to a double, pass the remaining records along in the flow.
As @PhilipL mentioned, we did a thorough examination of the issue, and found that it was specifically the "Supplier" field that sits on the right edge of the blue bar at the top of the sample PDFs. Recreating that template field with slightly different boundaries corrected some of the values, but we weren't able to achieve 100%. Going further, we tested on various colored backgrounds and text, but the issue only seemed to manifest in that particular circumstance. Removing records that don't resolve to a numeric data type is the best solution so far for this Starter Kit, and there is a replacement workflow that does just this posted by @Samantha_Jayne in this thread.