Paper: Are forecasting competitions data representative of the reality?

This paper by Spiliotis et. al, in the special issue of the International Journal of Forecasting on the M4 Competition is an interesting look at characterizing the time series in forecasting competitions according to statistical properties. The paper references work of Kang, et. al. (2017) which identifies 6 features to characterize time series, including properties such as trend, seasonality, and autocorrelation. A collection of time series can be represented as a region in this space and then Principal Components Analysis can be used to map it onto the two dimensional plane. The collection of series from two different forecasting competitions can then be visually compared and holes in the feature space can be identified. When one of the collections is taken as the reference then gaps in the projection of the second collection are informative, especially if methods trained on the smaller set are then applied to the larger set.

Spiliotis et. al extend the feature description by adding 4 features from Wang et. al (2006). They then take 4 models used in the M competitions and perform a linear regression of the model forecast errors on the 10 features to determine which features seem to be more influential for which model. Alternatively, one could use this to identify which models are more robust in for particular time series. All in all this is an interesting approach. I’m going to see if I can apply it to the time series underlying some of the forecasting questions in the recent IARPA Geopolitical Forecasting Challenge 2 to see where they fall in feature space.

Referring back to my earlier short post on ensembles, it seems to me that this technique could be used to determine ensemble weights for a collection of time series models. Using the regression model, predict the expected error for each model based on the historical values of the time series as decomposed into the 6 or 10 feature space. Weight the models so that the most accurate model receives the greatest weight. I expect the authors have already thought of this and are probably researching it, if they haven’t already.