Finding Better Subwords for Tibetan Neural Machine Translation

Yachao Li,Ning Ma,Jing Jiang,Jia Yangji

doi:10.1145/3448216

Abstract

Subword segmentation plays an important role in Tibetan neural machine translation (NMT). The structure of Tibetan words consists of two levels. First, words consist of a sequence of syllables, and then a syllable consists of a sequence of characters. According to this special word structure, we propose two methods for Tibetan subword segmentation, namely syllable-based and character-based methods. The former generates subwords based on the Tibetan syllables, and the latter is based on Tibetan characters. In addition, we carry out experiments with these two subword segmentation methods on low-resource Tibetan-to-Chinese NMT, respectively. The experimental results show that both of them can improve translation performance, in which the subword segmentation based on character sequences can achieve better results. Overall, our proposed character-based subword segmentation is more simple and effective. Moreover, it can achieve better experimental results without paying much attention to the linguistic features of Tibetan.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Finding Better Subwords for Tibetan Neural Machine Translation

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing

Lead the way for us

Journal: ACM Transactions on Asian and Low-Resource Language Information Processing	Publication Date: Mar 15, 2021
Citations: 5

Similar Papers

SelfSeg: A Self-supervised Sub-word Segmentation Method for Neural Machine Translation
Haiyue Song ... Eiichiro Sumita
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 22
Haiyue Song, et. al.Haiyue Song ... Eiichiro Sumita
24 Aug 2023
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 22

Neural Machine Translation for Low-resource English-Bangla
Mohammad Abdullah Al Mumin ... Muhammed Zafar Iqbal
Journal of Computer Science | VOL. 15
Mohammad Abdullah Al Mumin, et. al.Mohammad Abdullah Al Mumin ... Muhammed Zafar Iqbal
01 Nov 2019
Journal of Computer Science | VOL. 15

A Compression-Based Multiple Subword Segmentation for Neural Machine Translation
Keita Nonaka ... Tomohiro I
Electronics | VOL. 11
Keita Nonaka, et. al.Keita Nonaka ... Tomohiro I
24 Mar 2022
Electronics | VOL. 11

Bilingual Subword Segmentation for Neural Machine Translation
Hiroyuki Deguchi ... Akihiro Tamura
-
Hiroyuki Deguchi, et. al.Hiroyuki Deguchi ... Akihiro Tamura
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Finding Better Subwords for Tibetan Neural Machine Translation

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing