Multiple Imputation Ensembles for Time Series (MIE-TS)

Aliya Aleryani,Aaron Bostrom,Wenjia Wang,Beatriz Iglesia

doi:10.1145/3551643

Abstract

Time series classification has become an interesting field of research, thanks to the extensive studies conducted in the past two decades. Time series may have missing data, which may affect both the representation and also modeling of time series. Thus, recovering missing data using appropriate time series-based imputation methods is an essential step. Multiple imputation is a data recovery method where it produced multiple imputed data. The method proves its usefulness in terms of reflecting the uncertainty inherit in missing data; however, it is under-researched in time series problems. In this article, we propose two multiple imputation approaches for time series. The first is a multiple imputation method based on interpolation. The second is a multiple imputation and ensemble method. First, we simulate missing consecutive sub-sequences under a Missing Completely at Random mechanism; then, we use single/multiple imputation methods. The imputed data are used to build bagging and stacking ensembles. We build ensembles using standard classification algorithms as well as time series classifiers. The standard classifiers involve Random Forest, Support Vector Machines, K-Nearest Neighbour, C4.5, and PART while TSCHIEF, Proximity Forest, Time Series Forest, RISE, and BOSS are chosen as time series classifiers. Our findings show that the combination of multiple imputation and ensemble improves the performance of the majority of classifiers tested in this study, often above the performance obtained from the complete data, even under increasing missing data scenarios. This may be because the diversity injected by multiple imputation has a very favourable and stabilising effect on the classifier performance, which is a very important finding.

Full Text