Improving the Robustness of Loanword Identification in Social Media Texts

Chenggang Mi

doi:10.1145/3572773

Abstract

As a potential bilingual resource, loanwords play a very important role in many natural language processing tasks. If loanwords in a low-resource language can be identified effectively, the generated donor-receipt word pairs will benefit many cross-lingual natural language processing tasks. However, most studies on loanword identification mainly focus on formal texts such as news and government documents. Loanword identification in social media texts is still an under-studied field. Since it faces many challenges and can be widely used in several downstream tasks, more efforts should be put on loanword identification in social media texts. In this study, we present a multi-task learning architecture with deep bi-directional recurrent neural networks for loanword identification in social media texts, where different task supervision can happen at different layers. The multi-task neural network architecture learns higher-order feature representations from word and character sequences along with basic spell error checking, part-of-speech tagging, and named entity recognition information. Experimental results on Uyghur loanword identification in social media texts in five donor languages (Chinese, Arabic, Russian, Turkish, and Farsi) show that our method achieves the best performance compared with several strong baseline systems. We also combine the loanword detection results into the training data of neural machine translation for low-resource language pairs. Experiments show that models trained on the extended datasets achieve significant improvements compared with the baseline models in all language pairs.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Improving the Robustness of Loanword Identification in Social Media Texts

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing

Lead the way for us

Journal: ACM Transactions on Asian and Low-Resource Language Information Processing	Publication Date: Mar 24, 2023
Citations: 1

Similar Papers

Improving protein disorder prediction by deep bidirectional long short-term memory recurrent neural networks.
Jack Hanson ... Kuldip Paliwal
Bioinformatics | VOL. 33
Jack Hanson, et. al.Jack Hanson ... Kuldip Paliwal
05 Dec 2016
Bioinformatics | VOL. 33

ASR error detection and recognition rate estimation using deep bidirectional recurrent neural networks
Atsunori Ogawa ... Takaaki Hori
-
Atsunori Ogawa, et. al.Atsunori Ogawa ... Takaaki Hori
01 Apr 2015
01 Apr 2015

Error detection and accuracy estimation in automatic speech recognition using deep bidirectional recurrent neural networks
Atsunori Ogawa ... Takaaki Hori
Speech Communication | VOL. 89
Atsunori Ogawa, et. al.Atsunori Ogawa ... Takaaki Hori
11 Mar 2017
Speech Communication | VOL. 89

Roman to Gurmukhi Social Media Text Normalization
Jagroop Kaur ... Jaswinder Singh
International Journal of Intelligent Computing and Cybernetics | VOL. 13
Jagroop Kaur, et. al.Jagroop Kaur ... Jaswinder Singh
30 Oct 2020
International Journal of Intelligent Computing and Cybernetics | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improving the Robustness of Loanword Identification in Social Media Texts

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing