Research on the Improvement of K-Nearest Neighbor Classifier for Imbalanced Text Categorization

Yanmei Yang,Linying Xu

doi:10.1109/imccc.2018.00204

Abstract

Some of the most widely used text classification methods, such as the K-Nearest Neighbor (KNN) algorithm, the Native Bayes (NB) algorithm and the Support Vector Machine (SVM) algorithm, in terms of the good performance in balanced data classification, have performed poorly in imbalanced data classification. To solve this problem, many researchers have come up with their solutions, we also propose a new method to improve the performance of K-Nearest Neighbor classifier on imbalanced classification. In this paper, we combines K-Nearest Neighbor classifier with a new feature selection method called NFS, improved Synthetic Minority Over-sampling Technique (SMOTE) and Tomek Links Under-sampling Technique. The experimental results demonstrate that the improved method has a significant improvement on the classification efficiency of the bias dataset in the K-Nearest Neighbor classifier.

Full Text