Abstract
Abstract Multilingual pre-trained models make it possible to develop natural language processing (NLP) applications for low-resource languages (LRLs) using the model of resource-rich languages (RRLs). However, the structural characteristics of the target languages can impact task-specific learning. In this paper, we investigate the influence of structural diversities of languages on the system’s overall performance. Specifically, we propose a customized approach to leverage task-specific data of low-resource language families via transfer learning from RRL. Our findings are based on question-answering tasks using the XLM-R, mBERT, and IndicBERT transformer models and Indic languages (Hindi, Bengali, and Telugu). On the XQuAD-Hindi dataset, the few-shot learning using Bengali improves the benchmark mBERT (F1/EM) score by +(10.86/7.87) and XLM-R score by +(3.84/4.42). Few-shot learning using Telugu has also improved the mBERT score by +(10.42/7.36) and +(3.04/2.72) for XLM-R. In addition, our model has demonstrated benchmark-compatible performance in a zero-shot setup with single-epoch task learning. This approach can be adapted for other NLP tasks for LRLs.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.