This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
I am trying out some of the most recent features of the Alteryx Intelligence Suite and am stuck with one specific question regarding the Predict-Tool. When using the Assisted Modeling Tool with guidance, the different preprocessing steps will automatically be added to the workflow canvas. I want to split my input data into train and test set using the Create Samples Tool before modeling, so I am able to verify the predictions made on unseen test data (see screenshot below).
My question/problem is as follows:
When running the Assisted Modelling, different preprocessing steps such as setting data types, cleaning up missing values and one hot encoding are performed on the train set before the data is fit using the Classification Tool. When connecting the Predict Tool, are all of the preprocessing steps also performed on the validation/test dataset? Or do I manually add them? Unfortunately the only output of the Machine Learning tools are of type "Model" so I can not really verify that on my own.
If I try to preprocess the validation data using the same Transformation Tools with the same configurations as in the picture below I receive the following error:
Predict (26) Prediction data is missing the following columns: ['PaperlessBilling', 'StreamingMovies', 'TechSupport', 'OnlineBackup', 'Partner', 'customerID', 'PhoneService', 'MultipleLines', 'Contract', 'StreamingTV', 'TotalCharges', 'Dependents', 'tenure', 'SeniorCitizen', 'gender', 'OnlineSecurity', 'PaymentMethod', 'MonthlyCharges', 'DeviceProtection', 'InternetService']
I have a question. After connecting the validation data directly to the D input anchor of the Predict tool, you have the predicted values for the validation data set but does not have any performance indicator associated with them. You basically cannot tell if the model is a good fit.
Does that mean assisted modeling tool uses all data to train and compare models and does not need to divide the data into training and validation data sets?
My understanding is Predict tool is similar to Scoring tool. Model comparison has been embedded in the assisted modeling configuration process. Tools in the Predictive pallet need to use model comparison tool and validation data set to compare model performance.