TKNN: An Improved KNN Algorithm Based on Tree Structure

Li Juan

doi:10.1109/cis.2011.310

Abstract

Text classification is the process of assigning document to a set of previously fixed categories. It is widely used in many applications, such as web page categorization, email spam filtering, and document indexing, etc. Many popular algorithms for text classification have been proposed, such as Naive Bayes, K-Nearest Neighbor (KNN), and Support Vector Machine (SVM). However, these classification approaches do not perform well in multi-class text classification because they are well relied on linear classifiers. KNN is a simple and mature algorithm, but it cannot effectively solve the problem of overlapped categories borders, unbalanced class samples, k value determination, and overlarge search space. In this paper, we propose a new TKNN that absorb tree structure and adaptive k value method based on classical KNN algorithm. TKNN can overcome the shortcoming of KNN and improve the performance of multi-class text classification. Then the theoretical analysis and experimental results show TKNN can greatly enhance the classification efficiency than KNN.

Full Text