Abstract

The fuzzy rough set (FRS) acts as a powerful mathematical tool to deal with uncertain data, and it has many applications in feature selection, dimensionality reduction and classification. The fuzzy rough set based on robust nearest neighbor (FRS-RNN) is one of the vital classifiers which has been successfully applied to handle real-valued datasets. From the literature, it is very clearly evident that no research attempt has been made on FRS-RNN to text document classification. Generally, the document classification process consists of two crucial phases, namely feature extraction and classifier model construction. Mainly TF-IDF and convolutional neural network (CNN)-based techniques are used for efficient feature extraction. The CNN provides the best feature engineering through effective preprocessing the documents for better representation using pre-trained word embedding. In this paper, we proposed a modified CNN structure for both text document classification and feature extraction. Then, both FRS and FRS-RNN have been implemented for text document classification on the benchmark datasets like 20 Newsgroup and Reuter-21578 using both TF-IDF and modified CNN-based feature extraction techniques. The classification performance of the FRS, CNN and FRS-RNN is evaluated and compared using well-defined metrics like accuracy, precision, recall and F1-measure. Finally, the classification performance of FRS-RNN is compared with state-of-the-art traditional classification models such as SVM, KNN, Naive Bayes, DNN, CNN and RNN and with some recently developed classification models. The experimental results followed by empirical evaluation show that the proposed FRS-RNN outperforms all the aforementioned classification models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call