Soft Voting Windowing Ensembles for Learning from Partially Labelled Streams

Sean Louis Alan Floyd ,Herna L Viktor

doi:10.1007/978-3-030-48861-1_6

Abstract

Mining data streams has become an important topic due to the increased availability of vast amounts of online data. In such incremental learning scenarios, observations arrive in a sequence over time and are subject to changes in data distributions, also known as concept drifts. Interleaved test-then-train evaluations are often used during supervised learning from streaming data. The idea is intuitive: we first use each instance to test a model, then it is used for training. However, true class labels may be missing or arrive well after the prediction, which implies that they cannot be used for training and/or drift detection. Based on these considerations, we introduce our LESS-TWE ensemble-based method for online learning in domains where full reliance on labels would be unfeasible. Our approach combines weighted soft voting and unsupervised drift detection to reduce the dependency on labels during model construction. In cases where the label is unavailable, the most confident label, as predicted through weighted soft voting, is selected. Similarly, our unlabelled drift detector flags for drifts based on the voting confidence, rather than relying on the true label. Our experimental evaluation indicates that our algorithm is very fast, achieves comparable predictive accuracy when compared to the state-of-the-art and outperforms baseline methods.

Full Text