Sentence level text classification in the Kannada language - a classifier's perspective

R Jayashree,K Srikantamurthy,Basavaraj S Anami

doi:10.1504/ijcvr.2015.071335

Abstract

Better information retrieval techniques are needed to address the problem of information explosion. Major portion of data available online is text, which gives rise to huge feature space, hence, structured organisation and retrieval is very important. Information retrieval in the context of Indian languages is not uncommon, but IR in the South Indian language Kannada is quite new. This work focuses on sentence level text classification in the Kannada language, which is a fine grained approach to text classification; here, we look at the suitability of classifiers such as naïve Bayesian, bag of words and support vector machine (SVM) for the same. The dimensionality reduction technique using two different approaches: minimum term frequency and stop word removal methods are carried out in this work and the performance analysis of the above mentioned classifiers are noted.

Full Text