Abstract

Recently, deep learning has achieved impressive success in text mining and Natural Language Processing tasks. Bert is one of the remarkably rewarding deep learning models that is employed in a variety of NLP classification tasks such as intent and topic detection, question answering, sentiment analysis, hate speech detection, and so on. Plenty of studies have implemented different models of classification using pre-trained Bert models. Fine-tuning is done by adding either a simple fully connected layer, BiLSTM, convolutional layers, or a combination of them. Each of those models has fine-tuned Bert for a specific task. The results do not always approve neither the efficiency of using complex fine-tuned models of Bert nor the generalization of them. In this study, we extensively inspected various Bert-based fine-tuning models for different text classification tasks. Several types of fine-tuning Bert models varying in their classification layer are implemented and the performance of them is meticulously investigated. The implemented fine-tuning models are using alternatively deep learning networks such as convolutional networks and BiLSTM. The output layer of each model is studied to receive entirely varying inputs coming from the distinct layers of Bert. We conducted considerable experiments to find the most general outperforming model. We discover that adding a simple dense layer to the pre-trained Bert model, as a classifier, surpasses other types of deep neural network layers in the investigated tasks. We examine different values of hyperparameters to find the optimized combination providing the highest performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call