Abstract

Increase in generation of real-time data resulted in need of more processing requirements. However, processing of such data has several challenges associated with it. One of the major challenges in processing real-time data is to handle the implicit data imbalance. This paper proposes a two-phase stacking ensemble method to handle data imbalances more effectively during classification process. The proposed model utilizes multiple classifier algorithms in the first phase to predict data. The predicted data is used as input for the second phase. The second phase is a meta-learner, operating on predictions rather than the actual data. Experiments were conducted on data with varied imbalance levels. Obtained results indicate high efficiency of the proposed model in predicting with imbalanced data. A comparison with state-of-the-art model indicates improved performance.

Highlights

  • Classification is a categorization of data mining domain, which deals with supervised identification of class labels, given a large training dataset

  • This paper presents an effective two-phase stacking model to effectively handle data imbalances contained in data

  • This paper proposes a two phase stacking ensemble technique aimed to counter data imbalances in benchmark datasets taken from UCI and KEEL repositories

Read more

Summary

INTRODUCTION

Classification is a categorization of data mining domain, which deals with supervised identification of class labels, given a large training dataset. Performance of classifiers is usually hindered by several intrinsic properties of data and data distributions One such major issue contained in several real-time data is data imbalance [1]. Due to the huge number of instances contained in the majority classes, the classifier is overly trained on the majority classes and due to the low instance levels in minority classes, the classifier receives low training in terms of the minority classes. This biased training leads to poor predictions.

LITERATURE REVIEW
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call