An Improved KNN Algorithm for Text Classification

Huijuan Li,Bing Han,He Jiang,Dongyuan Wang

doi:10.1109/imccc.2018.00225

Abstract

Among the many text classification algorithms based on vector space model, the effect of KNN(K-Nearest Neighbor) classifier is outstanding. For KNN classification algorithm, calculating the similarity between documents will directly affect the selections of K neighbors, which greatly affects the classification effect. However, the traditional KNN text classification is too rough to calculate text similarity, ignoring the relations within the document and the relationships between the documents. Therefore, this paper proposes an improved KNN algorithm, which calculates similarity by considering the interaction and coupling relationship between the document internal and the document. Theoretical analysis and experiments show that the improved algorithm can overcome the shortcomings of the previous algorithms and improve the accuracy of the KNN text classification.

Full Text