Abstract

Most existing multi-label data streams classification methods focus on extending single-label streams classification approaches to multi-label cases, without considering the special characteristics of multi-label stream data, such as label dependency, concept drift, and recurrent concepts. Motivated by these challenges, we devise an efficient ensemble paradigm for multi-label data streams classification. The algorithm deploys a novel change detection based on Jensen–Shannon divergence to identify different kinds of concept drift in data streams. Moreover, our method tries to consider label dependency by pruning away infrequent label combinations to enhance classification performance. Empirical results on both synthetic and real-world datasets have demonstrated its effectiveness.

Highlights

  • In recent years, sensor networks [1], spam filtering [2], intrusion detection [3], and credit card fraud detection [4] have contributed to different new applications in continuously arriving data known as data streams [5]

  • In order to meet the above challenges, we develop an efficient ensemble scheme for multi-label data streams aiming at taking into account label dependencies as well as dealing with different types of concept drift

  • This paper introduces four popular performance metrics designed for multi-label data streams classifications: Hamming loss, Subset accuracy, F1, and Log-Loss

Read more

Summary

Introduction

Sensor networks [1], spam filtering [2], intrusion detection [3], and credit card fraud detection [4] have contributed to different new applications in continuously arriving data known as data streams [5]. In the data streams model, instances arrive at a higher rate, and the algorithms must process them with strict constraints of time and memory [6]. Traditional methods focus on classifying data streams under single-label scenarios where each instance belongs to a single label. Many real-world applications involve data with multi-label data streams. Multi-label stream classification is a non-trivial task, because traditional multi-label classification approaches work under the batch settings. An important feature of multi-label data streams is concept drift, i.e., the underlying distribution of data may change over time. Such changes might deteriorate the predictive accuracy of classifiers

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.