Abstract

Online class imbalance learning is a new learning problem that combines the challenges of both online learning and class imbalance learning. It deals with data streams having very skewed class distributions. This type of problems commonly exists in real-world applications, such as fault diagnosis of real-time control monitoring systems and intrusion detection in computer networks. In our earlier work, we defined class imbalance online, and proposed two learning algorithms OOB and UOB that build an ensemble model overcoming class imbalance in real time through resampling and time-decayed metrics. In this paper, we further improve the resampling strategy inside OOB and UOB, and look into their performance in both static and dynamic data streams. We give the first comprehensive analysis of class imbalance in data streams, in terms of data distributions, imbalance rates and changes in class imbalance status. We find that UOB is better at recognizing minority-class examples in static data streams, and OOB is more robust against dynamic changes in class imbalance status. The data distribution is a major factor affecting their performance. Based on the insight gained, we then propose two new ensemble methods that maintain both OOB and UOB with adaptive weights for final predictions, called WEOB1 and WEOB2. They are shown to possess the strength of OOB and UOB with good accuracy and robustness.

Highlights

  • O NLINE class imbalance learning is an emerging topic that is attracting growing attention

  • The advantages of Oversamplingbased Online Bagging (OOB) and Undersamplingbased Online Bagging (UOB) are: 1) resampling is algorithm-independent, which allows any type of online classifiers to be used; 2) time-decayed class size used in OOB and UOB dynamically estimates imbalance status without storing old data or using windows, and adaptively decides the resampling rate at each time step; 3) like other ensemble methods, they combine the predictions from multiple classifiers, which are expected to be more accurate than a single classifier

  • We focus on the fundamental issue of class imbalance and look into the following questions under different imbalanced scenarios: 1) to what extent does resampling in OOB and UOB help to deal with class imbalance online? 2) How do they perform in comparison with other state-of-the-art algorithms? 3) How are they affected by different types of class imbalance and classifiers? For the first question, we compare OOB and UOB with Online Bagging (OB) [8], to show the effectiveness of resampling

Read more

Summary

INTRODUCTION

O NLINE class imbalance learning is an emerging topic that is attracting growing attention. Online learning and class imbalance learning have been well studied in the literature individually, the combined problem has not been discussed much It is commonly seen in real world applications, such as intrusion detection in computer networks and fault diagnosis of control monitoring systems [4]. When both issues of online learning and class imbalance exist, new challenges and interesting research questions arise, with regards to the prediction accuracy on the minority class and adaptivity to dynamic environments. Based on the achieved results, for better accuracy and robustness under dynamic scenarios, we propose two ensemble strategies that maintain both OOB and UOB with adaptive weight adjustment, called WEOB1 and WEOB2.

Defining Class Imbalance
Online Solutions OOB and UOB
Existing Research
IMPROVED OOB AND UOB
CLASS IMBALANCE ANALYSIS IN STATIC DATA STREAMS
Data Description and Experimental Settings
Settings
Role of Resampling
Comparison with Other Algorithms
Factorial Analysis of Data-Related Factors and Base Classifiers
CLASS IMBALANCE ANALYSIS IN DYNAMIC DATA STREAMS
Role of the Time-Decayed Metric
Factorial Analysis of the Decay Factor
ENSEMBLES OF OOB AND UOB
Weighted Ensemble of OOB and UOB
Findings
CONCLUSIONS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call