An Anchor-Based Fuzzy Rough Feature Selection for Text Categorization

Ananya Gupta,Shahin Ara Begum

doi:10.1007/978-981-19-5936-3_26

Abstract

Big datasets are characterized by large dimension consisting of hundreds of thousands features with uncertainties and imprecisions. It becomes a challenging task to represent these datasets in memory short environments. Feature selection techniques enable dimensionality reduction of such datasets by finding subsets of relevant features from the original feature space. To produce optimal feature subset, an ideal feature selection technique should be capable of handling the interdependencies and uncertainties in the features. In this paper, we propose a new hybrid feature selection technique called anchor-based fuzzy rough feature selection (ABFRFS) based on anchor graph-based learning and fuzzy rough feature selection for text categorization. Although anchor graph-based feature selection and fuzzy rough feature selection have been proposed independently earlier, yet the hybrid of anchor graph and fuzzy rough feature selection called ABFRFS is proposed with the intuition to overcome the inherent problem of representing big datasets in memory short environments and at the same time retain the interdependency uncertainty among its features while maintaining the clustering accuracy. It is observed that with the proposed ABFRFS technique the feature space is reduced to an extent of 98% on an average on the considered benchmark datasets with acceptable degree of clustering accuracy.

Full Text