Comparative Performance of Machine Learning Methods for Text Classification

Dianabasi Nkantah,Bello Aliyu Muhammad,Rahat Iqbal,Tun Myat Aung,Ni Ni Hla,Anne James

doi:10.1109/iccit-144147971.2020.9213788

Abstract

Machine learning methods, including Deep learnings are popular for data processing. Deep learning methods have shown great promise in applied in natural language processing (NLP) tasks. Text classification is a supervised machine learning task that involves labelled documents to train classifier. Previous works involving machine learning and deep learning methods for text classification have been tested with relatively small- sized data instances. In this paper, we compared the performance of the machine leaning and deep learning algorithms in text classification task. This paper also studied, explored and compared the scalability of these methods with respect to bigger data instances. We used support vector machines (SVM), Logistic regression, Random forest and Naive Bayes Machine leaning algorithms and convolutional neural network (CNN) deep learning method. The task involved a multi-class classification problem involving six (6) classes consisting of six thousand (6,000) data instances with an average of 20 sentences in each data instance. The CNN deep leaning algorithm outperformed all the machine learning algorithms, achieving an accuracy of over 85%. This is because the filter weights are leaning are updated in backward propagation in each epoch, hence, this result in better result compared to the traditional methods.

Full Text