A new feature selection method for handling redundant information in text classification

You-wei Wang,Li-zhou Feng

doi:10.1631/fitee.1601761

Abstract

Feature selection is an important approach to dimensionality reduction in the field of text classification. Because of the difficulty in handling the problem that the selected features always contain redundant information, we propose a new simple feature selection method, which can effectively filter the redundant features. First, to calculate the relationship between two words, the definitions of word frequency based relevance and correlative redundancy are introduced. Furthermore, an optimal feature selection (OFS) method is chosen to obtain a feature subset FS1. Finally, to improve the execution speed, the redundant features in FS1 are filtered by combining a predetermined threshold, and the filtered features are memorized in the linked lists. Experiments are carried out on three datasets (WebKB, 20-Newsgroups, and Reuters-21578) where in support vector machines and naive Bayes are used. The results show that the classification accuracy of the proposed method is generally higher than that of typical traditional methods (information gain, improved Gini index, and improved comprehensively measured feature selection) and the OFS methods. Moreover, the proposed method runs faster than typical mutual information-based methods (improved and normalized mutual information-based feature selections, and multilabel feature selection based on maximum dependency and minimum redundancy) while simultaneously ensuring classification accuracy. Statistical results validate the effectiveness of the proposed method in handling redundant information in text classification.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A new feature selection method for handling redundant information in text classification

Abstract

Talk to us

Similar Papers

More From: Frontiers of Information Technology & Electronic Engineering

Lead the way for us

Journal: Frontiers of Information Technology & Electronic Engineering	Publication Date: Feb 1, 2018
Citations: 10

Similar Papers

Feature redundancy removal for text classification using correlated feature subsets
Lazhar Farek ... Amira Benaidja
Computational Intelligence | VOL. 40
Lazhar Farek, et. al.Lazhar Farek ... Amira Benaidja
21 Dec 2023
Computational Intelligence | VOL. 40

Research on Feature Selection and kNN Classification Method in Chinese Text Classification
Chao Xiao ... Ping Wu
-
Chao Xiao, et. al.Chao Xiao ... Ping Wu
01 Jan 2015
01 Jan 2015

Novel artificial bee colony based feature selection method for filtering redundant information
Youwei Wang ... Lizhou Feng
Applied Intelligence | VOL. 48
Youwei Wang, et. al.Youwei Wang ... Lizhou Feng
04 Aug 2017
Applied Intelligence | VOL. 48

Overall Survival Prognostic Modelling of Non-small Cell Lung Cancer Patients Using Positron Emission Tomography/Computed Tomography Harmonised Radiomics Features: The Quest for the Optimal Machine Learning Algorithm
Mehdi Amini ... Habib Zaidi
Clinical Oncology | VOL. 34
Mehdi Amini, et. al.Mehdi Amini ... Habib Zaidi
03 Dec 2021
Clinical Oncology | VOL. 34

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A new feature selection method for handling redundant information in text classification

Abstract

Talk to us

Similar Papers

More From: Frontiers of Information Technology &amp; Electronic Engineering

More From: Frontiers of Information Technology & Electronic Engineering