A high-quality feature selection method based on frequent and correlated items for text classification

Heba Mamdouh Farghaly,Tarek Abd El-Hafeez

doi:10.1007/s00500-023-08587-x

Abstract

The feature selection problem is a significant challenge in pattern recognition, especially for classification tasks. The quality of the selected features plays a critical role in building effective models, and poor-quality data can make this process more difficult. This work explores the use of association analysis in data mining to select meaningful features, addressing the issue of duplicated information in the selected features. A novel feature selection technique for text classification is proposed, based on frequent and correlated items. This method considers both relevance and feature interactions, using association as a metric to evaluate the relationship between the target and features. The technique was tested using the SMS spam collecting dataset from the UCI machine learning repository and compared with well-known feature selection methods. The results showed that the proposed technique effectively reduced redundant information while achieving high accuracy (95.155%) using only 6% of the features.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Soft Computing - A Fusion of Foundations, Methodologies and Applications	Publication Date: Jun 4, 2023
Citations: 8	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A high-quality feature selection method based on frequent and correlated items for text classification

Abstract

Talk to us

Similar Papers

More From: Soft Computing - A Fusion of Foundations, Methodologies and Applications

Lead the way for us

Similar Papers

A novel filter feature selection method for text classification: Extensive Feature Selector
Bekir Parlak ... Alper Kursat Uysal
Journal of Information Science | VOL. 49
Bekir Parlak, et. al.Bekir Parlak ... Alper Kursat Uysal
13 Apr 2021
Journal of Information Science | VOL. 49

Feature Selection for Data Mining
Vanda Angelis ... Giovanni Felici
-
Vanda Angelis, et. al.Vanda Angelis ... Giovanni Felici
01 Jan 2006
01 Jan 2006

Identification of important features and data mining classification techniques in predicting employee absenteeism at work
Amal Al-Rasheed
International Journal of Electrical and Computer Engineering (IJECE) | VOL. 11
Amal Al-RasheedAmal Al-Rasheed
01 Oct 2021
International Journal of Electrical and Computer Engineering (IJECE) | VOL. 11

Optimality and stability of feature set for traffic classification
...
-
, et. al. ...
24 Jan 2020
24 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A high-quality feature selection method based on frequent and correlated items for text classification

Abstract

Talk to us

Similar Papers

More From: Soft Computing - A Fusion of Foundations, Methodologies and Applications