An optimal approach for text feature selection

Wassim El-Hajj,Hazem Hajj

doi:10.1016/j.csl.2022.101364

Abstract

Traditionally, feature selection is conducted by first deriving a candidate list of features, then ranking and selecting the top features based on predefined threshold. These methods are highly dependent on the choice of the threshold, and therefore lead to sub-optimal text categorization results. In this paper, we address the selection problem by suggesting a one-step method designed to optimally select the subset of features. The selection is formulated mathematically as an optimization problem with the objective of maximizing classification accuracy while simultaneously deriving and choosing the most discriminative features. Our method, MFX, is applicable to many of the conventional methods, with two distinguishing aspects. First, it is based on considering all documents from the same category as one extended document, instead of analyzing individual documents. Second, it considers choosing the most discriminative terms that are frequent and common across all documents of the same category, and minimally present in other categories. Moreover, MFX is language-independent. It was tested on the well-known benchmark Reuters RCV1 dataset. To showcase its language independence, MFX was also tested on Arabic datasets extracted from Arabic news sources. The results indicated that MFX always performed similar to or better than other well-known feature selection methods. MFX with a Support Vector Machine (SVM) classifier was also shown to outperform recent text classification algorithms based on neural networks and word embeddings.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An optimal approach for text feature selection

Abstract

Talk to us

Similar Papers

More From: Computer Speech & Language

Lead the way for us

Journal: Computer Speech & Language	Publication Date: Feb 1, 2022
Citations: 2

Similar Papers

Word Embeddings for Natural Language Processing

-

01 Jan 2015
01 Jan 2015

Boosting capuchin search with stochastic learning strategy for feature selection
Mohamed Abd Elaziz ... Rehab Ali Ibrahim
Neural Computing and Applications | VOL. 35
Mohamed Abd Elaziz, et. al.Mohamed Abd Elaziz ... Rehab Ali Ibrahim
22 Mar 2023
Neural Computing and Applications | VOL. 35

Gene Selection by Hybrid Feature Selection Approaches and Classification Techniques in Microarray Dataset for Cancer Prediction
W Jaisingh ... Subash Chandra Bose Jaganathan
-
W Jaisingh, et. al.W Jaisingh ... Subash Chandra Bose Jaganathan
15 Dec 2022
15 Dec 2022

Neural word and entity embeddings for ad hoc retrieval
Ebrahim Bagheri ... Feras Al-Obeidat
Information Processing and Management | VOL. 54
Ebrahim Bagheri, et. al.Ebrahim Bagheri ... Feras Al-Obeidat
25 Apr 2018
Information Processing and Management | VOL. 54

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An optimal approach for text feature selection

Abstract

Talk to us

Similar Papers

More From: Computer Speech &amp; Language

More From: Computer Speech & Language