Abstract

An enormous and ever-growing volume of data is nowadays becoming available in a sequential fashion in various real-world applications. Learning in nonstationary environments constitutes a major challenge, and this problem becomes orders of magnitude more complex in the presence of class imbalance. We provide new insights into learning from nonstationary and imbalanced data in online learning, a largely unexplored area. We propose the novel Adaptive REBAlancing (AREBA) algorithm that selectively includes in the training set a subset of the majority and minority examples that appeared so far, while at its heart lies an adaptive mechanism to continually maintain the class balance between the selected examples. We compare AREBA with strong baselines and other state-of-the-art algorithms and perform extensive experimental work in scenarios with various class imbalance rates and different concept drift types on both synthetic and real-world data. AREBA significantly outperforms the rest with respect to both learning speed and learning quality. Our code is made publicly available to the scientific community.

Highlights

  • E FFICIENT and effective analysis methods for the ever-increasing volume of sequential data in a wide range of applications are of paramount importance

  • 1) We provide new insights into learning from nonstationary and imbalanced data, a largely unexplored area that focuses on the combined challenges of class imbalance and concept drift in online learning

  • 2) We propose the novel Adaptive REBAlancing (AREBA) algorithm that maintains the aforementioned desired properties

Read more

Summary

INTRODUCTION

E FFICIENT and effective analysis methods for the ever-increasing volume of sequential data in a wide range of applications are of paramount importance. In such environments, a classifier with learning capabilities is of vital importance as it will provide an adaptive behavior and help maintain optimal performance. We address the combined challenges of drift and imbalance in online (or one-by-one) learning, i.e., when a single example arrives at each step. The desired properties of an online classifier learning from nonstationary and imbalanced data are as follows [4], [7]. 1) We provide new insights into learning from nonstationary and imbalanced data, a largely unexplored area that focuses on the combined challenges of class imbalance and concept drift in online learning.

BACKGROUND
RELATED WORK
Concept Drift
Class Imbalance
Open Challenges
PROPOSED METHOD
Queue-Based Resampling
4: Initialization
Adaptive Rebalancing
EXPERIMENTAL SETUP
Data Sets
Performance Metrics
Evaluation Method
Role of the Adaptive Rebalancing Mechanism
Role of the Memory Size
Stationary Data
Nonstationary Data
Data With Noisy Class Labels
Real-World Data
Dual Nature
QBR Versus AREBA
Choice of Classifier
Findings
Verification Latency
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call