Abstract

As an emerging research topic, online class imbalance learning often combines the challenges of both class imbalance and concept drift. It deals with data streams having very skewed class distributions, where concept drift may occur. It has recently received increased research attention; however, very little work addresses the combined problem where both class imbalance and concept drift coexist. As the first systematic study of handling concept drift in class-imbalanced data streams, this paper first provides a comprehensive review of current research progress in this field, including current research focuses and open challenges. Then, an in-depth experimental study is performed, with the goal of understanding how to best overcome concept drift in online learning with class imbalance.

Highlights

  • W ITH the wide application of machine learning algorithms to the real world, class imbalance and concept drift have become crucial learning issues

  • The major contributions of this paper include: 1) this is the first comprehensive study that looks into concept drift detection in class-imbalanced data streams; 2) data problems are categorized into different types of concept drifts and class imbalances with illustrative applications; 3) existing approaches are compared and analyzed systematically in each type; 4) pros and cons of each approach are investigated; 5) the results provide guidance for choosing the appropriate technique and developing better algorithms for future learning tasks; and 6) this is the first work exploring the role of class imbalance techniques in concept drift detection, which sheds light on whether and how to tackle class imbalance and concept drift simultaneously

  • The calculation of True Detection Rate (TDR), False Alarm (FA), and Delay of Detection (DoD) is the same for both of the abrupt and the gradual drifting cases, based on the following understanding: before a real concept drift occurs, all the reported alarms are considered as FAs; after a real concept drift starts, the first detection is seen as the true drift detection; after that and before the new real concept drift, the consequent detections are considered as FAs

Read more

Summary

A Systematic Study of Online Class Imbalance Learning With Concept Drift

Shuo Wang , Member, IEEE, Leandro L. Member, IEEE, and Xin Yao, Fellow, IEEE. Abstract— As an emerging research topic, online class imbalance learning often combines the challenges of both class imbalance and concept drift. It deals with data streams having very skewed class distributions, where concept drift may occur. It has recently received increased research attention; very little work addresses the combined problem where both class imbalance and concept drift coexist. An in-depth experimental study is performed, with the goal of understanding how to best overcome concept drift in online learning with class imbalance

INTRODUCTION
ONLINE LEARNING FRAMEWORK WITH CLASS IMBALANCE AND CONCEPT DRIFT
Learning Procedure
Problem Descriptions
OVERCOMING CLASS IMBALANCE AND CONCEPT DRIFT SIMULTANEOUSLY
Illustrative Applications
Approaches to Tackling Both Class Imbalance and Concept Drift
PERFORMANCE ANALYSIS
Data Sets
Experimental and Evaluation Settings
Comparative Study on Artificial Data
Comparative Study on Real-World Data
Further Discussion
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call