A semi-supervised clustering-based classification model for classifying imbalanced data streams in the presence of scarcely labelled data

Kiran Bhowmick,Meera Narvekar

doi:10.1504/ijbidm.2022.120827

Abstract

Data streams are potentially infinite in length, fast changing and scarcely labelled. It is practically impossible to label all the observed instances. Online frameworks for classifying data streams are generally supervised in nature assuming the availability of labelled data and hence cannot be used for data streams. Semi-supervised learning (SSL) addresses this problem of scarcely labelled data by using large amount of unlabelled data together with labelled data to build classifiers. Data streams may also suffer from the problem of imbalanced data. Previous works in learning from data streams have analysed problems of imbalanced data. But to the best of our knowledge no work has applied semi-supervised learning approaches for classifying imbalanced data streams so far. This paper proposes a model using a semi-supervised clustering technique to classify an imbalanced data stream in the presence of scarcely labelled data. The results prove that the model outperforms many state-of-the-art techniques.

Full Text