Abstract
Nowadays classification of imbalanced datasets has gained a lot of attention in the field of data mining and machine learning. This type of classification is a challenging task due to imbalanced distribution of the data. In addition, conventional algorithms do not perform well in classification of such high dimensional data. Researchers have proposed many techniques for improving the classification of high dimensional class-imbalanced data. These approaches fall into five categories: 1- Cost-sensitive learning, 2- Algorithm level, 3- Data level, 4- Ensemble learning and 5- Feature selection methods. In this paper, we have exploited cost-sensitive and feature selection methods. Our proposed method first uses fuzzy c-means clustering (FCM) to estimate the probability of each sample being a member of the minority class. Using this method, samples which are hard to classify are determined based on their probabilities and then the cost-sensitive learning is employed to estimate their cost. Finally, particle swarm optimization algorithm enhanced by levy flight is applied for feature selection and a SVM model is trained for classification of the data. By using particle swarm optimization alongside with SVM classifier, best features are determined and misclassification rate of minority samples is reduced. The evaluation of the proposed method is done by using G-Mean metric and 10-fold cross validation. The results of conducted experiments show the superiority of the proposed method compared to other approaches in classification of high dimensional class-imbalanced data.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.