Abstract

Many practical applications, such as social media and monitoring system, will constantly generate streaming data, which has problems of instability, lack of labels and multiclass imbalance. In order to solve these problems, a cluster-based active learning method is proposed to achieve data stream classification. Firstly, a label query strategy combining marginal threshold matrix is proposed, which selects difficult to classify or potential concept drift samples for marking, to solve the problem of high cost label and unbalanced data. Secondly, dynamic maintenance of a group of micro clusters, by adjusting the weight of micro clusters in the model, correctly reflects the current data distribution, and finally, uses the buffer to store new micro clusters to participate in the update of the model, to adapt to the new data environment. Experimental results on three real data sets and three synthetic data sets show that compared with the classical data stream classification algorithm, it is less affected by concept drift and has higher classification accuracy than the online semi-supervised learning algorithm ADSM. The average accuracy of the six datasets increased by 5.56%, 2.32%, 1.77%, 1.83%, 3.78%, and 2.04%, respectively. The model processes data streams online and improves classification performance with less memory consumption.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.