TWO-PHASE STACKING ENSEMBLE TO EFFECTIVELY HANDLE DATA IMBALANCES IN CLASSIFICATION PROBLEMS

K Madasamy

doi:10.26483/ijarcs.v9i1.5495

Abstract

Increase in generation of real-time data resulted in need of more processing requirements. However, processing of such data has several challenges associated with it. One of the major challenges in processing real-time data is to handle the implicit data imbalance. This paper proposes a two-phase stacking ensemble method to handle data imbalances more effectively during classification process. The proposed model utilizes multiple classifier algorithms in the first phase to predict data. The predicted data is used as input for the second phase. The second phase is a meta-learner, operating on predictions rather than the actual data. Experiments were conducted on data with varied imbalance levels. Obtained results indicate high efficiency of the proposed model in predicting with imbalanced data. A comparison with state-of-the-art model indicates improved performance.

Highlights

Classification is a categorization of data mining domain, which deals with supervised identification of class labels, given a large training dataset
This paper presents an effective two-phase stacking model to effectively handle data imbalances contained in data
This paper proposes a two phase stacking ensemble technique aimed to counter data imbalances in benchmark datasets taken from UCI and KEEL repositories

Summary

INTRODUCTION

Classification is a categorization of data mining domain, which deals with supervised identification of class labels, given a large training dataset. Performance of classifiers is usually hindered by several intrinsic properties of data and data distributions One such major issue contained in several real-time data is data imbalance [1]. Due to the huge number of instances contained in the majority classes, the classifier is overly trained on the majority classes and due to the low instance levels in minority classes, the classifier receives low training in terms of the minority classes. This biased training leads to poor predictions.

LITERATURE REVIEW

Findings

CONCLUSION