This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Have just been working through a couple of new models on the server:
- First up: the classic Alumni donation dataset (that we use in the Model Comparison Tool Sample on the Public Gallery). Performed well, no issues at all. Nice resulting model, albeit for a tiny dataset.
First, when you import data from a CSV, you get a message saying 'We're bringing in your 1 files[sic]. After you go to the next step you can't add anymore[sic] files. Make sure to bring in all files now.' -except I couldn't see a way to add more files at this stage?
Second - when the AutoModel page is reached it says 'Select Run to start the modeling process.' but it seems that the process starts automatically?
Next feedback - the model (Extra Trees classifier) did well - as well as Logistic Regression in the Kaggle notebook above (which required a lot of manual processing), but adding in the holdout data took *ages* relative to all the other steps - any reasons why?
Final feedback: the number of features used in this model was quite high (58 in total, vs around 20 in the original dataset) - what's the strategy for ensuring that we don't overfit as part of the model building/evaluation? It's not clear from the model pipeline image what feature selection strategy we use to make sure that we're not throwing everything into the kitchen sink before stirring the algos.....