Abstract

In the data mining communal, imbalanced class dispersal data sets have established mounting consideration. The evolving field of data mining and information discovery seeks to establish precise and effective computational tools for the investigation of such data sets to excerpt innovative facts from statistics. Sampling methods re-balance the imbalanced data sets consequently improve the enactment of classifiers. For the classification of the imbalanced data sets, over-fitting and under-fitting are the two striking problems. In this study, a novel weighted ensemble method is anticipated to diminish the influence of over-fitting and under-fitting while classifying these kinds of data sets. Forty imbalanced data sets with varying imbalance ratios are engaged to conduct a comparative study. The enactment of the projected method is compared with four customary classifiers including decision tree(DT), k-nearest neighbor (KNN), support vector machines (SVM), and neural network (NN). This evaluation is completed with two over-sampling procedures, an adaptive synthetic sampling approach (ADASYN), and a synthetic minority over-sampling (SMOTE) technique. The projected scheme remained efficacious in diminishing the impact of over-fitting and under-fitting on the classification of these data sets.

Highlights

  • In the data mining community, imbalanced class distribution data sets have received mounting consideration (He and Garcia, 2009)

  • An imbalanced data set has unequal majority and minority class examples (Chawla et al, 2004). While learning from these data sets, two significant issues over-fitting and under-fitting have to be faced by investigators these two are the foremost reasons for the poor enactment of the machine learning (ML) algorithms

  • To counter these issues related to the imbalanced data sets, the following are the obectives of this research work: 1. A novel weighted ensemble method will be anticipated

Read more

Summary

Background

In the data mining community, imbalanced class distribution data sets have received mounting consideration (He and Garcia, 2009). An imbalanced data set has unequal majority and minority class examples (Chawla et al, 2004). A research work stated that existing imbalanced learning methods which employ normal SVMs substantially ignore the vital facts of majority class and generate over-fitted results (Zhang and Wang, 2013). Neural network does not converge in classification problems of imbalanced data sets (Panchal et al, 2011; Piotrowski and Napiorkowski, 2013) This failure may occur due to the few hidden neurons (Zhang and Wang, 2013). By reviewing all the accessible material, it comes to our knowledge that the existing material lacks those ensemble methods which can efficiently handle fitting and generalization problems simultaneously To counter these issues related to the imbalanced data sets, the following are the obectives of this research work: 1.

Decision Tree
K Nearest Neighbour
Majority label for Yi where i I
Neural Network
Synthetic Minority Over Sampling Technique
Adaptive Synthetic Sampling Approach
Proposed Weighted Ensemble Method
Simulation Study
Performance Evaluation Measures
G F Acc G
Concluding Remarks
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call