A Systematic Literature Review of Text Classification: Datasets and Methods

Gusti Muhammad Riduan,Teguh Bharata Adji,Indah Soesanti

doi:10.1109/icitisee53823.2021.9655788

Abstract

We study the literature in major journals and conferences on the usage of shallow learning and deep learning methods for text classification. Shallow learning techniques such as Naive Bayes, Support Vector Machine, Random Forests were initially widely used to solve problems in text classification. however, these techniques generally require the presence of a precise feature extraction model, which is often very complex to produce precise accuracy. For this reason, researchers continue to try to find other learning techniques that are more efficient and provide a significant increase in accuracy. So currently deep learning methods such as Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) are more widely used to solve text classification cases. From 2016 up to the present, this literature study aimed to recognize and assess research methods and datasets utilized in text classification studies. Seventy-three text classification research articles posted from January 2016 until July 2021 were retained and chosen to be explored further based on the established inclusion and exclusion criteria. This literature review was conducted in a methodical manner. A systematic literature review is defined as a method for recognizing, evaluating, and interpreting all available study materials for the purpose of answer certain research questions. The following diagram depicts the overall distribution of text classification methods. Furthermore, public datasets were used in 85 percent of the research projects, whereas private datasets were used in 15 percent of the research studies. Twenty different strategies have been used. Eight of the most commonly used approaches in text classification were identified from the twenty methods. Researchers recommended integrating various machine learning methods, employing an increased algorithm, appending feature selection, and applying parameter optimization for some classifiers to improve the accuracy of machine learning classifiers for text classification. The findings of this study also revealed that are frequently mentioned and thus significant in the field of text classification.

Full Text