Analysis of article type category based on KNN

Qiushi Wang

doi:10.54254/2755-2721/48/20241341

Abstract

Natural Language Processing (NLP) boasts a rich historical background, evolving since the 1950s at the crossroads of artificial intelligence and linguistics. Over time, it has seamlessly converged with Information Retrieval (IR), adopting diverse approaches encompassing symbolic, statistical, and connectionist methods. This study focuses on the utilization of the k-nearest neighbours (KNN) algorithm for the categorization of articles. It delves into the feasibility of accurately classifying articles when provided with ample datasets and clearly defined category labels. Through the development of a model and the integration of the KNN algorithm, this experiment successfully conducted content-based article classification. By selecting an appropriate value for k, employing a confusion matrix for performance assessment, and predicting article categories, the experiment achieved an accuracy rate of 0.96. Nonetheless, limitations arise when dealing with small datasets or imbalanced article distributions. This paper delves into the intricacies of article type classification, emphasizing the pivotal roles of data quality and feature engineering in this process. Furthermore, it underscores the potential for in-depth exploration in various contexts using alternative methodologies, such as Bayesian, Support Vector Machines, and deep learning, providing valuable references for future research endeavours.

Full Text