This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
One thing I can suggest is that you may have to split your dataset into two samples, one used to train your model (Estimation, ~80%) and one to make predictions (Validation sample, ~20%).
The reason you may want to think to implement this, is because otherwise your model will be biased, as seemingly it would make more accurate predictions than it actually can. The issue with that would become clearer if you apply the same model on a different dataset that your model has never "seen" before; chances are you accuracy to be quite low there.
I will revert with more comments hopefully, but this is a key thing you should take into consideration.
It depends on the model you're using. The Linear and Gamma regression model require numeric predictors, so you have to one hot encode any categorical variables that you might want. Most of the other models can handle categorical variable, but you should research the models first. You should also look at the Data Investigation tools. The can help you find correlations between your possible predictors to ensure that you're not including pairs of variables that are strongly positively or negatively correlated.