An alternative framework for univariate filter based feature selection for text categorization

D.S Guru,Mahamad Suhil,Lavanya Narayana Raju,N Vinay Kumar

doi:10.1016/j.patrec.2017.12.025

Abstract

In this paper, we introduce an alternative framework for selecting a most relevant subset of the original set of features for the purpose of text categorization. Given a feature set and a local feature evaluation function (such as chi-square measure, mutual information etc.,) the proposed framework ranks the features in groups instead of ranking individual features. A group of features with rth rank is more powerful than the group of features with (r+1)th rank. Each group is made up of a subset of features which are supposed to be capable of discriminating every class from every other class. The added advantage of the proposed framework is that it automatically eliminates the redundant features while selecting features without requirement of study of features in combination. Further the proposed framework also helps in handling overlapping classes effectively through selection of low ranked yet powerful features. An extensive experimentation has been conducted on three benchmarking datasets using four different local feature evaluation functions with Support Vector Machine and Naïve Bayes classifiers to bring out the effectiveness of the proposed framework over the respective conventional counterparts.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An alternative framework for univariate filter based feature selection for text categorization

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition Letters

Lead the way for us

Journal: Pattern Recognition Letters	Publication Date: Jan 2, 2018
Citations: 33

Similar Papers

Information gain and divergence-based feature selection for machine learning-based text categorization
Changki Lee ... Gary Geunbae Lee
Information Processing & Management | VOL. 42
Changki Lee, et. al.Changki Lee ... Gary Geunbae Lee
03 Aug 2005
Information Processing & Management | VOL. 42

An Improved Ambiguity Measure Feature Selection for Text Categorization
Zhiying Liu ... Jieming Yang
-
Zhiying Liu, et. al.Zhiying Liu ... Jieming Yang
01 Aug 2012
01 Aug 2012

A Data-drive Feature Selection Method in Text Categorization
Yan Xu
Journal of Software | VOL. 6
Yan XuYan Xu
04 Jan 2011
Journal of Software | VOL. 6

Dynamic Feature Selection in Text Classification
Son Doan ... Susumu Horiguchi
-
Son Doan, et. al.Son Doan ... Susumu Horiguchi
01 Jan 2006
01 Jan 2006

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An alternative framework for univariate filter based feature selection for text categorization

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition Letters