Variable Global Feature Selection Scheme for automatic classification of text documents

Deepak Agnihotri,Kesari Verma,Priyanka Tripathi

doi:10.1016/j.eswa.2017.03.057

Abstract

The feature selection is important to speed up the process of Automatic Text Document Classification (ATDC). At present, the most common method for discriminating feature selection is based on Global Filter-based Feature Selection Scheme (GFSS). The GFSS assigns a score to each feature based on its discriminating power and selects the top-N features from the feature set, where N is an empirically determined number. As a result, it may be possible that the features of a few classes are discarded either partially or completely. The Improved Global Feature Selection Scheme (IGFSS) solves this issue by selecting an equal number of representative features from all the classes. However, it suffers in dealing with an unbalanced dataset having large number of classes. The distribution of features in these classes are highly variable. In this case, if an equal number of features are chosen from each class, it may exclude some important features from the class containing a higher number of features. To overcome this problem, we propose a novel Variable Global Feature Selection Scheme (VGFSS) to select a variable number of features from each class based on the distribution of terms in the classes. It ensures that, a minimum number of terms are selected from each class. The numerical results on benchmark datasets show the effectiveness of the proposed algorithm VGFSS over classical information science methods and IGFSS.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Variable Global Feature Selection Scheme for automatic classification of text documents

Abstract

Talk to us

Similar Papers

More From: Expert Systems With Applications

Lead the way for us

Journal: Expert Systems With Applications	Publication Date: Mar 27, 2017
Citations: 81

Similar Papers

An improved global feature selection scheme for text classification
Alper Kursat Uysal
Expert Systems with Applications | VOL. 43
Alper Kursat UysalAlper Kursat Uysal
06 Sep 2015
Expert Systems with Applications | VOL. 43

A Novel Inherent Distinguishing Feature Selector for Highly Skewed Text Document Classification
Muhammad Sajid Ali ... Kashif Javed
Arabian Journal for Science and Engineering | VOL. 45
Muhammad Sajid Ali, et. al.Muhammad Sajid Ali ... Kashif Javed
22 Jul 2020
Arabian Journal for Science and Engineering | VOL. 45

New Feature Selection Method for Text Categorization
...
Journal of information and communication convergence engineering | VOL. 15
, et. al. ...
01 Mar 2017
Journal of information and communication convergence engineering | VOL. 15

Soft voting technique to improve the performance of global filter based feature selection in text corpus
Deepak Agnihotri ... Kesari Verma
Applied Intelligence | VOL. 49
Deepak Agnihotri, et. al.Deepak Agnihotri ... Kesari Verma
21 Nov 2018
Applied Intelligence | VOL. 49

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Variable Global Feature Selection Scheme for automatic classification of text documents

Abstract

Talk to us

Similar Papers

More From: Expert Systems With Applications