Hi All - I’ve created a forest model to predict future units. I took 70% of a dataset to train the model and 30% to test the model.
I was wondering if anyone can offer any advice on the following questions:
Please let me know if I can clarify anything.
Thanks,
Steve
Solved! Go to Solution.
Hi Steve,
For me, it seems that the 70/30 split is useful for validating hyperparameter tuning (e.g. config panel settings). Once you're happy with those, retraining with 100% of the data generally gives a slightly better model, so that's my preference.
For retraining, I would do the same thing: tune hyperparameters using a 70/30 split, and then retrain on 100% If I'm comfortable over time that my hyperparameters never need alteration, then just retrain on the new 100%
That's just me; I'd be interested in other viewpoints too though.
Thanks for sharing John! This is helpful. Open to hearing any other views, but we'll definitely consider this approach.