Hi,
I'm new to using the Machine Learning Tool Palette in the Alteryx Intelligence Suite. I've read the help features and followed the web based learning videos but I wondered if anyone could give a view as to whether what I'm trying to do makes sense in the grand scheme of things!
I am using a Score field which ranges between 0 and 100 as my Prediction field.
I want to to test the accuracy of the Score field through predicting it based on the data in the other fields which should all influence the score given. These other fields include positive/negative sentiment of comments and other factors. I've used the Assisted Modelling Tool to do this.
The question I need to check with the community is on the final tool "Predict Values". As I'm looking to compare the predicted value against the actual score, I've connected the "D" anchor to the original dataset. I think this makes sense, but wanted to check as I wasn't sure if I needed to connect it to a completely new set of data.
I'm not sure if there's a dummies guide to this suite somewhere!
Hi @PeterAP ,
great that you found the learning videos already.
ML Models are usually 'trained' on data where the prediction field is known. Then you can use this trained model to make predictions.
You can see this in the Predict tool as the D-Anchor for Data, and the M-Anchor for the (trained) Model. In general, you don't want to run the prediction on the training data, as this defeats the purpose of seeing if the model is good or not. Here is what you can do instead: use the 'create samples' tool to split your data into training and evaluation dataset, e.g. 60% for training and 30% for evaluation. Train your model on one part, and run the prediction on the other part to compare prediction with actual value.
You can find a good demo on Assisted Modeling in this webinar in part 3: https://pages.alteryx.com/master-how-to-become-a-citizen-data-scientist.html
Please mark this as the solution if it answered your question.
Happy solving,
Kilian
Solutions Engineer - Alteryx
Thanks @KilianL. I'm looking to gain some comfort over the accuracy of the actual score field rather than predict based on an unknown future set of data..
I basically have a number of features, for example, 'the number of negative comments' per case. Fields such as this do not have a directly calculated relationship to the score generated, but should influence i.e. a higher number of negative comments, should result in a reduction of the score. Therefore I set about trying to create a model to predict the score based upon these fields - so that I could compare against the Actual Score someone has input.
Is there a better way to achieve this the outcome I want, rather than use the assisted modelling?
Do you know where I can find the videos in the link you mentioned? It takes me to a sign up page, but it only lets me download a calendar invite for 2021 - rather than any links to the videos?
Thanks
@PeterAP , if you click on the calendar invite, you will see the links to all 3 sessions. They should lead you to the recordings on the on24.com platform. For a demo on assisted modeling, check out the third session. A brief intro here as well: https://www.youtube.com/watch?v=iuqfy_RJCeM
To your other question on how to know how well the model performs, and also the impact of each feature: this is included in the evaluation phase of assisted modeling. You can find feature importance and a range of different metrics to evaluate model/prediction precision.
User | Count |
---|---|
19 | |
14 | |
13 | |
9 | |
8 |