Abstract

One of the most critical data analysis tasks is the streaming data classification, where we may also observe the concept drift phenomenon, i.e., changing the decision model’s probabilistic characteristics. From a practical point of view, we may face this type of banking, medicine, or cybersecurity task to enumerate only a few. A vital characteristic of these problems is that the classes we are interested in (e.g., fraudulent transactions, treats, or serious diseases) are usually infrequent, which hinders the classification system design. The paper presents a novel algorithm DSCB (Deterministic Sampling Classifier with weighted Bagging) employs data preprocessing methods and weighted bagging technique to classify non-stationary imbalanced data stream. It builds models based on an incoming data chunk, but it also takes previously arrived instances into account. The proposed approach has been evaluated based on a wide range of computer experiments carried out on real and artificially generated data streams with various imbalance ratios, label noise levels, and concept drift types. The results confirmed that the weighted bagging ensemble coupled with data preprocessing could outperform state-of-the-art methods.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.