News Category Classification Using Distinctive Bag of Words and ANN Classifier

Amritpal Singh,Sunil Kumar Chhillar

doi:10.23956/ijermt.v6i6.288

Abstract

Category classification, for news, is a multi-label text classification problem. The goal is to assign one or more categories to a news article. A standard technique in multi-label text classification is to use a set of binary classifiers. For each category, a classifier is used to give a “yes” or “no” answer on if the category should be assigned to a text. Some of the standard algorithms for text classification that are used for binary classifiers include Naive Bayesian Classifiers, Support Vector Machines, artificial neural networks etc. In this distinctive bag of words have been used as feature set based on high frequency word tokens found in individual category of news. The algorithm presented in this work is based on a keyword extraction algorithm that is capable of dealing with English language in which different news categories i.e. Business, entertainment, politics, sports etc. has been considered. Intra-class news classification has been carried out in which Cricket and Football in sports category has been selected to verify the performance of the algorithm. Experimental results shows high classification rate in describing category of a news document.

Full Text