Understanding of Data Preprocessing for Dimensionality Reduction Using Feature Selection Techniques in Text Classification

Varun Dogra,N. Z. Jhanjhi,M. N. Talib,Aman Singh,Sahil Verma,Kavita Kavita

doi:10.1007/978-981-16-3153-5_48

Abstract

The volume of textual data in digital form is growing with each day. For arranging these textual data, text classification has been used. To achieve efficient text classification, data preprocessing is an important phase. It prepares information for machine learning models. Text classification, however, has the issue of the high dimensionality of space for features. Feature selection is a technique for data preprocessing widely used on high-dimensional data. By feature selection techniques, this high dimensionality of feature space is solved and increases text classification efficiency. Feature selection explores how a list of features used to create text classification models may be chosen. Its goals include reducing dimensionality, deleting uninformative features, reducing the amount of data available to classifiers for learning, and enhancing classifiers’ predictive performance. The different methods of feature selection are presented in this paper. This paper also presents the advantages and limitations of feature selection methods.

Full Text