A New Feature Selection Metric Based on Rough Sets and Information Gain in Text Classification

Rasim ÇEKİK,Mahmut KAYA

doi:10.54287/gujsa.1379024

Abstract

In text classification, taking words in text documents as features creates a very high dimensional feature space. This is known as the high dimensionality problem in text classification. The most common and effective way to solve this problem is to select an ideal subset of features using a feature selection approach. In this paper, a new feature selection approach called Rough Information Gain (RIG) is presented as a solution to the high dimensionality problem. Rough Information Gain extracts hidden and meaningful patterns in text data with the help of Rough Sets and computes a score value based on these patterns. The proposed approach utilizes the selection strategy of the Information Gain Selection (IG) approach when pattern extraction is completely uncertain. To demonstrate the performance of the Rough Information Gain in the experimental studies, the Micro-F1 success metric is used to compare with Information Gain Selection (IG), Chi-Square (CHI2), Gini Coefficient (GI), Discriminative Feature Selector (DFS) approaches. The proposed Rough Information Gain approach outperforms the other methods in terms of performance, according to the results.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A New Feature Selection Metric Based on Rough Sets and Information Gain in Text Classification

Abstract

Talk to us

Similar Papers

More From: Gazi University Journal of Science Part A: Engineering and Innovation

Lead the way for us

Journal: Gazi University Journal of Science Part A: Engineering and Innovation	Publication Date: Dec 31, 2023
Citations: 1

Similar Papers

Empirical Study on Filter based Feature Selection Methods for Text Classification
Saptarsi Goswami ... Subhajit Deysarakar
International Journal of Computer Applications | VOL. 81
Saptarsi Goswami, et. al.Saptarsi Goswami ... Subhajit Deysarakar
15 Nov 2013
International Journal of Computer Applications | VOL. 81

Rough Set Based Approach to Text Classification
Libiao Zhang ... Yuefeng Li
-
Libiao Zhang, et. al.Libiao Zhang ... Yuefeng Li
01 Nov 2013
01 Nov 2013

Two new feature selection metrics for text classification
Durmuş Özkan Şahin ... Erdal Kılıç
Automatika | VOL. 60
Durmuş Özkan Şahin, et. al.Durmuş Özkan Şahin ... Erdal Kılıç
03 Apr 2019
Automatika | VOL. 60

Ambiguity measure feature‐selection algorithm
Saket S.R. Mengle ... Nazli Goharian
Journal of the American Society for Information Science and Technology | VOL. 60
Saket S.R. Mengle, et. al.Saket S.R. Mengle ... Nazli Goharian
02 Feb 2009
Journal of the American Society for Information Science and Technology | VOL. 60

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A New Feature Selection Metric Based on Rough Sets and Information Gain in Text Classification

Abstract

Talk to us

Similar Papers

More From: Gazi University Journal of Science Part A: Engineering and Innovation