Abstract
<span dir="ltr">Large datasets that are publicly available for POS tagging do not always exist </span><span dir="ltr">for some languages. </span><span dir="ltr">One of those languages is Javanese, a local language in </span><span dir="ltr">Indonesia, which is considered as a low-resource language. This research aims </span><span dir="ltr">to examine the effectiveness of cross-lingual transfer learning for Javanese POS </span><span dir="ltr">tagging by fine-tuning the state-of-the-art Transformer-based models (such as </span><span dir="ltr">IndoBERT, mBERT, and XLM-RoBERTa) using different kinds of source lan</span><span dir="ltr">guages that have a higher resource (such as Indonesian, English, Uyghur, Latin, </span><span dir="ltr">and Hungarian languages), and then fine-tuning it again using the Javanese lan</span><span dir="ltr">guage as the target language. We found that the models using cross-lingual trans</span><span dir="ltr">fer learning can increase the accuracy of the models without using cross-lingual </span><span dir="ltr">transfer learning by 14.3%–15.3% over LSTM-based models, and by 0.21%–</span><span dir="ltr">3.95% over Transformer-based models. Our results show that the most accurate </span><span dir="ltr">Javanese POS tagger model is XLM-RoBERTa that is fine-tuned in two stages </span><span dir="ltr">(the first one using Indonesian language as the source language, and the second </span><span dir="ltr">one using Javanese language as the target language), capable of achieving an </span><span dir="ltr">accuracy of 87.65%</span>
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IAES International Journal of Artificial Intelligence (IJ-AI)
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.