Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Predictive Modeling – Training/Testing Question

sjm
8 - Asteroid

Hi All - I’ve created a forest model to predict future units. I took 70% of a dataset to train the model and 30% to test the model.

 

I was wondering if anyone can offer any advice on the following questions:

  • When we put the model into production, should we use that same model (with 70% of the original data) to score the new records each week? Or should our production model be based on 100% of the original data?
  • We were thinking about the possibility of adding a new week of data to the training set on a weekly basis. If we do this, should we use 100% of the new records or should we set 30% aside (to test to see if the new data adds value)?

Please let me know if I can clarify anything.

 

Thanks,

Steve

2 REPLIES 2
JohnJPS
15 - Aurora

Hi Steve,

 

For me, it seems that the 70/30 split is useful for validating hyperparameter tuning (e.g. config panel settings).  Once you're happy with those, retraining with 100% of the data generally gives a slightly better model, so that's my preference.

 

For retraining, I would do the same thing: tune hyperparameters using a 70/30 split, and then retrain on 100%  If I'm comfortable over time that my hyperparameters never need alteration, then just retrain on the new 100%

 

That's just me; I'd be interested in other viewpoints too though.

 

sjm
8 - Asteroid

Thanks for sharing John! This is helpful. Open to hearing any other views, but we'll definitely consider this approach. 

Labels