Automated System for Movie Review Classification using BERT

Shivani Rana,Shruti Jain,Rakesh Kanji

doi:10.2174/2666255816666230507182018

Abstract

Aims: Text classification emerged as an important approach to advancing Natural Language Processing (NLP) applications concerning the available text on the web. To analyze the text, many applications are proposed in the literature. Background: The NLP, with the help of deep learning, has achieved great success in automatically sorting text data in predefined classes, but this process is expensive and time-consuming. Objectives: To overcome this problem, in this paper, various Machine Learning techniques are studied & implemented to generate an automated system for movie review classification. Methodology: The proposed methodology uses the Bidirectional Encoder Representations of the Transformer (BERT) model for data preparation and predictions using various machine learning algorithms like XG boost, support vector machine, logistic regression, naïve Bayes, and neural network. The algorithms are analyzed based on various performance metrics like accuracy, precision, recall and F1 score. Result: The results reveal that the 2-hidden layer neural network outperforms the other models by achieving more than 0.90 F1 score in the first 15 epochs and 0.99 in just 40 epochs on the IMDB dataset, thus reducing the time to a great extent. Conclusion: 100% accuracy is attained using a neural network, resulting in a 15% accuracy improvement and 14.6% F1 score improvement over logistic regression.

Full Text