Feature selection via maximizing global information gain for text classification

Changxing Shang,Min Li,Shengzhong Feng,Qingshan Jiang,Jianping Fan

doi:10.1016/j.knosys.2013.09.019

Abstract

Feature selection is a vital preprocessing step for text classification task used to solve the curse of dimensionality problem. Most existing metrics (such as information gain) only evaluate features individually but completely ignore the redundancy between them. This can decrease the overall discriminative power because one feature’s predictive power is weakened by others. On the other hand, though all higher order algorithms (such as mRMR) take redundancy into account, the high computational complexity renders them improper in the text domain. This paper proposes a novel metric called global information gain (GIG) which can avoid redundancy naturally. An efficient feature selection method called maximizing global information gain (MGIG) is also given. We compare MGIG with four other algorithms on six datasets, the experimental results show that MGIG has better results than others methods in most cases. Moreover, MGIG runs significantly faster than the traditional higher order algorithms, which makes it a proper choice for feature selection in text domain.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Feature selection via maximizing global information gain for text classification

Abstract

Talk to us

Similar Papers

More From: Knowledge-Based Systems

Lead the way for us

Journal: Knowledge-Based Systems	Publication Date: Oct 14, 2013
Citations: 114

Similar Papers

An efficient unsupervised feature selection procedure through feature clustering
Xuyang Yan ... Edward Tunstel
Pattern Recognition Letters | VOL. 131
Xuyang Yan, et. al.Xuyang Yan ... Edward Tunstel
03 Jan 2020
Pattern Recognition Letters | VOL. 131

Rough set and its application in Chinese spam filtering
Yan Xu
-
Yan XuYan Xu
01 Nov 2011
01 Nov 2011

A simple and efficient filter feature selection method via document-term matrix unitization
Qing Li ... Jinming Wen
Pattern Recognition Letters | VOL. 181
Qing Li, et. al.Qing Li ... Jinming Wen
19 Mar 2024
Pattern Recognition Letters | VOL. 181

An Efficient Hybrid Feature Selection Method Using the Artificial Immune Algorithm for High-Dimensional Data.
Yongbin Zhu ... Wenshan Li
Computational intelligence and neuroscience | VOL. 2022
Yongbin Zhu, et. al.Yongbin Zhu ... Wenshan Li
13 Oct 2022
Computational intelligence and neuroscience | VOL. 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Feature selection via maximizing global information gain for text classification

Abstract

Talk to us

Similar Papers

More From: Knowledge-Based Systems