Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Predictive Modeling – Training/Testing Question

sjm
8 - Asteroid

Hi All - I’ve created a forest model to predict future units. I took 70% of a dataset to train the model and 30% to test the model.

 

I was wondering if anyone can offer any advice on the following questions:

  • When we put the model into production, should we use that same model (with 70% of the original data) to score the new records each week? Or should our production model be based on 100% of the original data?
  • We were thinking about the possibility of adding a new week of data to the training set on a weekly basis. If we do this, should we use 100% of the new records or should we set 30% aside (to test to see if the new data adds value)?

Please let me know if I can clarify anything.

 

Thanks,

Steve

2 REPLIES 2
JohnJPS
15 - Aurora

Hi Steve,

 

For me, it seems that the 70/30 split is useful for validating hyperparameter tuning (e.g. config panel settings).  Once you're happy with those, retraining with 100% of the data generally gives a slightly better model, so that's my preference.

 

For retraining, I would do the same thing: tune hyperparameters using a 70/30 split, and then retrain on 100%  If I'm comfortable over time that my hyperparameters never need alteration, then just retrain on the new 100%

 

That's just me; I'd be interested in other viewpoints too though.

 

sjm
8 - Asteroid

Thanks for sharing John! This is helpful. Open to hearing any other views, but we'll definitely consider this approach. 

Labels