Abstract

A large amount of data streams in the form of texts and images has been emerging in many real-world applications. These data streams often present the characteristics such as multi-labels, label missing and new class emerging, which makes the existing data stream classification algorithm face the challenges in precision space and time performance. This is because, on the one hand, it is known that data stream classification algorithms are mostly trained on all labeled single-class data, while there are a large amount of unlabeled data and few labeled data due to it is difficult to obtain labels in the real world. On the other hand, many of existing multi-label data stream classification algorithms mostly focused on the classification with all labeled data and without emerging new classes, and there are few semi-supervised methods. Therefore, this paper proposes a semi-supervised ensemble classification algorithm for multi-label data streams based on co-training. Firstly, the algorithm uses the sliding window mechanism to partition the data stream into data chunks. On the former w data chucks, the multi-label semi-supervised classification algorithm COINS based on co-training is used to training a base classifier on each chunk, and then an ensemble model with w COINS classifiers is generated ensemble model to adapt to the environment of data stream with a large number of unlabeled data. Meanwhile, a new class emerging detection mechanism is introduced, and the w+1 data chunk is predicted by the ensemble model to detect whether there is a new class emerging. When a new label is detected, the classifier is retrained on the current data chunk, and the ensemble model is updated. Finally, experimental results on five real data sets show that: as compared with the classical algorithms, the proposed approach can improve the classification accuracy of multi-label data streams with a large number of missing labels and new labels emerging.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.