Abstract
Data streams are found in many large-scale systems such as security, finance, and internet. In many of the data streams, the class distribution is imbalanced, and hence most of the traditional classification modeler fails to produce high accuracy for the samples from minority class. In addition, data streams are changing and the model should be updated to maintain the classification performance. However, obtaining the true class labels of the data samples is not an easy task since labeling process is extremely time-consuming and very often class labels are not available immediately after classification. The goal of this research is to reduce the labeling for an imbalanced data stream, and to produce high classification performance when compared to fully labeling setting. In an imbalanced data stream, the challenging part is to find and label minority class samples. In this paper, we propose RLS-SOM (Reduced labeled Samples-Self Organizing Map) framework for classification of the imbalanced data stream in a non-stationary environment. RLS-SOM locates the minority class samples in the feature space using SOM. It maintains an ensemble of the classifiers and builds a new model when the changes occur, using only partial labeled samples. In RLS-SOM, the classification results are obtained from the ensemble, as well as each individual model in the ensemble. An individual model classification results are selected over ensemble results, if its performance is higher than the ensemble's performance. This comparison is performed to improve the performance as there may be one model in the ensemble that produces higher performance than the ensemble. Our experimental results demonstrate that RLS-SOM obtains higher performance when it is compared with several partially labeling techniques over benchmark data sets. In addition, the experimental results with other state of the art fully labeling methods such as UCB, SERA, SEA, and Learn++.CDS shows RLS-SOM maintains equivalent classification performance by using 10–30% labeling, on average.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.