Performance Analysis of Most Prominent Machine Learning and Deep Learning Algorithms In Classifying Bangla Crime News Articles

Salma Tabashum,Ariful Islam,Mun Yea Mahafi Taz Zahara,Md Mamun Hossain,Fahmida Naznin Fami

doi:10.1109/tensymp50017.2020.9230785

Abstract

This work is dedicated to Bangla Crime Type Classification. As very few works had been done for Bangla crime classifier. To carry out this research, first we have developed a Bangla crime dataset which contains around 24,295 news articles and made most of them publicly available at github. Then we have built our crime classifier model and trained the classifier with our own dataset. We have analyzed word vectors like bag of words, TF-IDF in state-of-art machine learning algorithms as well as most promising semantic and syntactic word embeddings like Word2Vec, GloVe, fast-Text in both shallow and deep CNN and RNN to select best word embeddings for our classifier module. Finally we have summarized the experimental result in tabular form. We can see that significant improved accuracy can be achieved using deep learning algorithms over state-of-art machine learning algorithms in classifying Bangla crime data. The final experimental result shows that using shallow CNN with fastText,proposed model is able to achieve 93.70% accuracy.

Full Text