Machine Learning for Forecasting

John Weirstrass Muteba
Jun 19, 2017
1 min read

Forecasting exercises are often made by dividing the sample data into two sub-samples: - the in-sample and the out-sample data.

The in-sample data is used to train the model while the out-sample data is used to compare the predictions generated with the model trained in the in-sample space. For a perfect model the difference between out-sample data and predictions should be equal to zero.

Practically this is impossible due to the fact that the behavior of the data in the in-sample space is not necessary expected to be repeated in the out-sample space. No wonder why all forecasts are wrong!

To overcome this issue, we can use the bootstrap re-sample techniques and create k-folder sub-samples in the in-sample space and use them to train the model(s).

Fortunately Machine Learning Regression Techniques such as Support Vector Regression, Random Forest, Boosting, Naive Bayes etc. can be used to train many models in the in-sample with k - folder sub-samples before they are used for prediction in the out-sample space. Cross-validation method can be used to analyse the performance of such models.

Python and R codes are available on request!

Analytics Research Group

Machine Learning for Forecasting

Comments