Relative discrimination criterion – A novel feature ranking method for text data

Abdur Rehman,Kashif Javed,Haroon A Babri,Mehreen Saeed

doi:10.1016/j.eswa.2014.12.013

Abstract

High dimensionality of text data hinders the performance of classifiers making it necessary to apply feature selection for dimensionality reduction. Most of the feature ranking metrics for text classification are based on document frequencies (df) of a term in positive and negative classes. Considering only document frequencies to rank features favors terms frequently occurring in larger classes in unbalanced datasets. In this paper we introduce a new feature ranking metric termed as relative discrimination criterion (RDC), which takes document frequencies for each term count of a term into account while estimating the usefulness of a term. The performance of RDC is compared with four well known feature ranking metrics, information gain (IG), CHI squared (CHI), odds ratio (OR) and distinguishing feature selector (DFS) using support vector machines (SVM) and multinomial naive Bayes (MNB) classifiers on four benchmark datasets, namely Reuters, 20 Newsgroups and two subsets of Ohsumed dataset. Our results based on macro and micro F1 measures show that the performance of RDC is superior than the other four metrics in 65% of our experimental trials. Also, RDC attains highest macro and micro F1 values in 69% of the cases.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Relative discrimination criterion – A novel feature ranking method for text data

Abstract

Talk to us

Similar Papers

More From: Expert Systems With Applications

Lead the way for us

Journal: Expert Systems With Applications	Publication Date: Dec 20, 2014
Citations: 70

Similar Papers

Feature selection based on a normalized difference measure for text classification
Abdur Rehman ... Haroon A Babri
Information Processing & Management | VOL. 53
Abdur Rehman, et. al.Abdur Rehman ... Haroon A Babri
27 Dec 2016
Information Processing & Management | VOL. 53

A two-stage Markov blanket based feature selection algorithm for text classification
Kashif Javed ... Haroon A Babri
Neurocomputing | VOL. 157
Kashif Javed, et. al.Kashif Javed ... Haroon A Babri
27 Jan 2015
Neurocomputing | VOL. 157

Comparison of feature selection methods in text classification on highly skewed datasets
Muhammad Nabeel Asim ... Muhammad Sajid Ali
-
Muhammad Nabeel Asim, et. al.Muhammad Nabeel Asim ... Muhammad Sajid Ali
01 Nov 2017
01 Nov 2017

Selection of the most relevant terms based on a max-min ratio metric for text classification
Abdur Rehman ... Muhammad Nabeel Asim
Expert Systems with Applications | VOL. 114
Abdur Rehman, et. al.Abdur Rehman ... Muhammad Nabeel Asim
19 Jul 2018
Expert Systems with Applications | VOL. 114

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Relative discrimination criterion – A novel feature ranking method for text data

Abstract

Talk to us

Similar Papers

More From: Expert Systems With Applications