Abstract
An enormous and ever-growing volume of data is nowadays becoming available in a sequential fashion in various real-world applications. Learning in nonstationary environments constitutes a major challenge, and this problem becomes orders of magnitude more complex in the presence of class imbalance. We provide new insights into learning from nonstationary and imbalanced data in online learning, a largely unexplored area. We propose the novel Adaptive REBAlancing (AREBA) algorithm that selectively includes in the training set a subset of the majority and minority examples that appeared so far, while at its heart lies an adaptive mechanism to continually maintain the class balance between the selected examples. We compare AREBA with strong baselines and other state-of-the-art algorithms and perform extensive experimental work in scenarios with various class imbalance rates and different concept drift types on both synthetic and real-world data. AREBA significantly outperforms the rest with respect to both learning speed and learning quality. Our code is made publicly available to the scientific community.
Highlights
E FFICIENT and effective analysis methods for the ever-increasing volume of sequential data in a wide range of applications are of paramount importance
1) We provide new insights into learning from nonstationary and imbalanced data, a largely unexplored area that focuses on the combined challenges of class imbalance and concept drift in online learning
2) We propose the novel Adaptive REBAlancing (AREBA) algorithm that maintains the aforementioned desired properties
Summary
E FFICIENT and effective analysis methods for the ever-increasing volume of sequential data in a wide range of applications are of paramount importance. In such environments, a classifier with learning capabilities is of vital importance as it will provide an adaptive behavior and help maintain optimal performance. We address the combined challenges of drift and imbalance in online (or one-by-one) learning, i.e., when a single example arrives at each step. The desired properties of an online classifier learning from nonstationary and imbalanced data are as follows [4], [7]. 1) We provide new insights into learning from nonstationary and imbalanced data, a largely unexplored area that focuses on the combined challenges of class imbalance and concept drift in online learning.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have