Abstract

Streaming Machine Learning (SML) studies algorithms that update their models, given an unbounded and often non-stationary flow of data performing a single pass. Online class imbalance learning is a branch of SML that combines the challenges of both class imbalance and concept drift. In this paper, we investigate the binary classification problem by rebalancing an imbalanced stream of data in the presence of concept drift, accessing one sample at a time. We propose an extensive comparative study of Continuous Synthetic Minority Oversampling Technique (C-SMOTE), inspired by the popular sampling technique Smote, as a meta-strategy to pipeline with SML classification algorithms. We benchmark C-SMOTE pipelines on both synthetic and real data streams, containing different types of concept drifts, different imbalance levels, and different class distributions. We bring statistical evidence that models learnt with C-SMOTE pipelines improve the minority class performance concerning both the baseline models and the state-of-the-art methods. We also perform a sensitivity analysis to detect the C-SMOTE impact on the majority class performance for the three types of concept drift and several class distributions. Moreover, we show a computational cost analysis in terms of time and memory consumption.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.