Fuzzy Rough Set-Based Feature Selection for Text Categorization

Ananya Gupta,Shahin Ara Begum

doi:10.1007/978-981-19-8566-9_4

Abstract

Recent technological advances led to accumulation of large volumes of data in digital repositories. Mining data for information retrieval from such repositories faces a big challenge both in perspective of dimensionality and the sample size. Mining tasks such as text mining have been confronted with the problem of high dimensionality of the data. Thus, it becomes necessary to minimize the high dimensionality of the data. Fuzzy rough set feature selection techniques have proved highly efficient in dimension reduction. It can successfully handle the data dependencies and reduce data dimensionality without compromising the performance of classification and clustering. In this paper, an attempt has been made to review major developments in fuzzy rough set-based feature selection domain over a period of 20 years. Further, the paper discusses the potential of fuzzy rough set-based feature selection in the domain of text categorization. A hybrid feature selection technique is proposed based on large-scale spectral clustering with landmark-based representation and fuzzy rough feature selection and it is found to work efficiently in memory short environments. Moreover, the proposed technique reduces the data dimensionality immensely on the considered datasets with acceptable degree of clustering accuracy.

Full Text