A Comparative Study on Feature Selection Techniques for Multi-cluster Text Data

Ananya Gupta,Shahin Ara Begum

doi:10.1007/978-981-13-0761-4_21

Abstract

Text clustering involves data that are of very high dimension. Feature selection techniques find subsets of relevant features from the original feature space that help in efficient and effective clustering. Selection of relevant features merely on ranking scores without considering correlation interferes with the clustering performance. An efficient feature selection technique should be capable of preserving the multi-cluster structure of the data. The purpose of the present work is to demonstrate that feature selection techniques which take into consideration the correlation among features in multi-cluster scenario show better clustering results than those techniques that simply rank features independent of each other. This paper compares two feature selection techniques in this regard viz. the traditional Tf-Idf and the Multi-Cluster Feature Selection (MCFS) technique. The experimental results over the TDT2 and Reuters-21,578 datasets show the superior clustering results of MCFS over traditional Tf-Idf.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Comparative Study on Feature Selection Techniques for Multi-cluster Text Data

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A Cheap Feature Selection Approach for the K-Means Algorithm.
Marco Capo ... Aritz Perez
IEEE Transactions on Neural Networks and Learning Systems | VOL. 32
Marco Capo, et. al.Marco Capo ... Aritz Perez
01 May 2021
IEEE Transactions on Neural Networks and Learning Systems | VOL. 32

Performance Evaluation of Filter-based Feature Selection Techniques in Classifying Portable Executable Files
S.L Shiva Darshan ... C.D Jaidhar
Procedia Computer Science | VOL. 125
S.L Shiva Darshan, et. al.S.L Shiva Darshan ... C.D Jaidhar
01 Jan 2018
Procedia Computer Science | VOL. 125

Diagnosis of Breast Cancer and Diabetes using Hybrid Feature Selection Method
Divya Jain ... Vijendra Singh
-
Divya Jain, et. al.Divya Jain ... Vijendra Singh
01 Dec 2018
01 Dec 2018

Software-based Prediction of Liver Disease with Feature Selection and Classification Techniques
Jagdeep Singh ... Ranjodh Kaur
Procedia Computer Science | VOL. 167
Jagdeep Singh, et. al.Jagdeep Singh ... Ranjodh Kaur
01 Jan 2020
Procedia Computer Science | VOL. 167

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Comparative Study on Feature Selection Techniques for Multi-cluster Text Data

Abstract

Talk to us

Similar Papers