Feature selection methods for text classification: a systematic literature review

Julliano Trindade Pintas,Ana Cristina Bicharra Garcia,Leandro A F Fernandes

doi:10.1007/s10462-021-09970-6

Abstract

Feature Selection (FS) methods alleviate key problems in classification procedures as they are used to improve classification accuracy, reduce data dimensionality, and remove irrelevant data. FS methods have received a great deal of attention from the text classification community. However, only a few literature surveys include them focusing on text classification, and the ones available are either a superficial analysis or present a very small set of work in the subject. For this reason, we conducted a Systematic Literature Review (SLR) that asses 1376 unique papers from journals and conferences published in the past eight years (2013–2020). After abstract screening and full-text eligibility analysis, 175 studies were included in our SLR. Our contribution is twofold. We have considered several aspects of each proposed method and mapped them into a new categorization schema. Additionally, we mapped the main characteristics of the experiments, identifying which datasets, languages, machine learning algorithms, and validation methods have been used to evaluate new and existing techniques. By following the SLR protocol, we allow the replication of our revision process and minimize the chances of bias while classifying the included studies. By mapping issues and experiment settings, our SLR helps researchers to develop and position new studies with respect to the existing literature.

Full Text