Abstract

With the availability of a broad range of applications for Big Data streaming, both the class imbalance and concept drift have become crucial learning issues. The concept of drift handling solutions is sensitive to class imbalance. The sampling techniques are widely applied to process the continuously arriving data streams with a sufficient number of instances. The selected instances have to build a statistical inference to support imbalanced class distribution. The stream data classification model without concept drift adaptation is not preferable to the imbalanced class distribution. To solve the issues, this article presents the dynamic sampling and an ensemble classification technique, named as Handling Imbalanced Data with Concept Drift (HIDC). To provide high statistical precision over imbalanced class distribution with concept drift, the HIDC decides an optimal reservoir size using the metrics regarding statistical properties of stream data and control parameter. The former refers to the inequality level in the values of instances arrived from a source, and the latter one controls over the selection of instances from multiple sources. The HIDC estimates the optimal reservoir size using such statistical and control parameters. To select the appropriate instances with an allocated optimal reservoir size, the HIDC applies random sampling over imbalanced classes and chooses a set of instances from multiple sources. The random sampling cannot solve the issues of imbalanced class distribution among the existing classes. To address such problems, the HIDC applies resampling techniques with respect to the imbalance factor. To identify and address the new concepts, the proposed HIDC sampling model trains the candidate classifier and replaces the worst ensemble member with the candidate classifier. Finally, the experimental results show that the HIDC performs better sampling and mining over imbalanced class distribution with concept drifts.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call