Resampling Imbalanced Data and Impact of Attribute Selection Methods in High Dimensional Data

K Ulaga Priya,S Pushpa

doi:10.1007/978-981-19-4044-6_2

Abstract

The class Imbalance problem in a classification dataset is challenging the machine learning community in terms of performance for minority data the poor representation of minority class affects the performance of the classification algorithm, which is evident through various assessment metrics. This paper deals with High dimensional Imbalanced data set and the effectiveness of feature selection on Imbalanced data. Imbalance on high dimensional data set leads to sub optimal performance of the classifier. More over handling dealing with high dimensional imbalance data, poses additional technical challenges that would result in overly fitting classifiers. Several approaches have been suggested in literature. In this paper different feature selection methods were explored and with the selected features Random under sampling technique are applied. The classification results reveal that feature selection and sampling is paramount to achieve the best possible results from high-dimensional imbalanced data. The results depict that sampling done with subset features yields good classification performance.KeywordsMachine learningImbalance dataSamplingFeature selectionClassification

Full Text