Ensemble Classifier for Imbalanced Streaming Data Using Partial Labeling

Elaheh Arabmakki,Tegjyot Singh Sethi,Mehmed Kantardzic

doi:10.1109/iri.2016.40

Abstract

The data streams in many applications are characterized by imbalanced class distribution. The pattern in data streams may also change over time and therefore, the classification model should be adjusted to maintain performance. Hence, a new set of labeled samples should be provided which is not an easy task, since labeling is expensive and time consuming. In this paper, we propose Reduced Labeled Samples-Ensemble (RLS-E) which is an ensemble based framework for classification of the imbalanced data stream. It uses partially labeled samples in building the model. The prediction of the labels is obtained in two different approaches: 1) from combined output of all the models in the ensemble 2) from each individual model in the ensemble. The more accurate prediction between 1 and 2 is selected as the final prediction. This method does not rely solely on the ensemble prediction as there could be one individual model in the ensemble which may produce a much higher prediction than the ensemble. Evaluation of RLS-E on real world data sets such as Adult, Ozone, CovType, and experimental comparison with other state of the art ensemble based techniques demonstrate that RLS-E optimizes the labels' prediction and produces better classification performance, requiring only 5-25% of samples to be labeled.

Full Text