Abstract

Language models that can perform linguistic tasks, just like humans, have surpassed all expectations in recent years. Recurrent Neural Networks (RNN) and Transformer Architectures have exponentially accelerated the development of Natural Language Processing. It has drastically affected how we handle textual data. Understanding the reliability and confidence of these models is crucial for developing machine learning systems that can be successfully applied in real life situations. Uncertainty-based quantitative and comparative research between these two types of architectures has not yet been conducted. It is vital to identify confident models in text classification tasks, as the modern world seeks safe and dependable intelligent systems. In this work, the uncertainty of Transformer-based models such as BERT and XLNet is compared to that of RNN variations such as Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU). To measure the uncertainty of these models, we use dropouts during the inference phase (Monte Carlo Dropout). Monte Carlo Dropout (MCD) has negligible computation costs and helps separate uncertain samples from the predictions. Based on our thorough experiments, we have determined that BERT surpasses all other models utilized in this study.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call