BERT-based models’ impact on machine reading comprehension in Hindi and Tamil

Shailender Kumar,Deepak Deepak

doi:10.1109/icac3n56670.2022.10074412

Abstract

"Multilingual Machine Comprehension" is a QA sub-task that comprises citing an answer to a question from a context, even if that answer written in a separate language from the excerpt itself. A lot of models have been trained to answer the question from a given short context which is a limitation of MRC, few models are considering this problem and adapting to handle the large input context to make the MRC more accessible and applicable to open domain scenarios. In this study, we examine Multilingual Representations for Indian Languages (MuRIL), rebalanced multilingual BERT (RemBERT), and XLM-RoBERTa, which are all BERT-based deep learning models. We trained these models to work on multilingual MRC particularly for two of the most used Indian languages Hindi and Tamil The datasets utilized in this study are freely available. The results of our research reveal that RemBERT outperformed other BERT-based deep learning models. For the dataset employed, the model received an F1 score of 84.58, an Exact Match of 74.05, and a Jaccard Index of 0.81.

Full Text