Automatic Text Classification in Information retrieval

Sanjay K Dwivedi,Chandrakala Arya

doi:10.1145/2905055.2905191

Abstract

Improvement in information retrieval performance relates to the accessibility, selection and management of large amounts of information on web that usually expressed as textual data and supervised machine learning approach is an important source of tool for automating information retrieval task. This paper provides a review of supervised machine learning approaches of text classification for information retrieval and present a comparative study of supervised machine learning algorithms namely Naive Bayes (NB), Support vector machine (SVM), K- nearest neighbor (KNN) and decision tree (DT). We used WEKA (Weikato Environment for Knowledge Analysis) tool to evaluate these four algorithms through a series of experiments. In WEKA Sequential Minimum Optimization (SMO) represents Support vector machine (SVM) classifier, IBK represents K- nearest neighbour and J48 version of Decision Tree is used. The result concludes that the performance of the algorithms depends on the characteristics of the datasets.

Full Text