Phonology-Augmented Statistical Framework for Machine Transliteration Using Limited Linguistic Resources

Gia H Ngo,Minh Nguyen,Nancy F Chen

doi:10.1109/taslp.2018.2875269

Abstract

Transliteration converts words in a source language (e.g., English) into words in a target language (e.g., Vietnamese). This conversion considers the phonological structure of the target language, as the transliterated output needs to be pronounceable in the target language. For example, a word in Vietnamese that begins with a consonant cluster is phonologically invalid and thus would be an incorrect output of a transliteration system. Most statistical transliteration approaches, albeit being widely adopted, do not explicitly model the target language's phonology, which often results in invalid outputs. The problem is compounded by the limited linguistic resources available when converting foreign words to transliterated words in the target language. In this work, we present a phonology-augmented statistical framework suitable for transliteration, especially when only limited linguistic resources are available. We propose the concept of pseudo-syllables as structures representing how segments of a foreign word are organized according to the syllables of the target language's phonology. We performed transliteration experiments on Vietnamese and Cantonese. We show that the proposed framework outperforms the statistical baseline by up to 44.68% relative, when there are limited training examples (587 entries).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Phonology-Augmented Statistical Framework for Machine Transliteration Using Limited Linguistic Resources

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Lead the way for us

Journal: IEEE/ACM Transactions on Audio, Speech, and Language Processing	Publication Date: Jan 1, 2019
Citations: 8

Similar Papers

Cross-Lingual Named Entity Recognition for Heterogenous Languages
Yingwen Fu ... Nankai Lin
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 31
Yingwen Fu, et. al.Yingwen Fu ... Nankai Lin
01 Jan 2023
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 31

Cross-language phoneme mapping for phonetic search keyword spotting using multiple source languages
Ella Tetariy ... Ruthi Alon-Lavi
Artificial Intelligence Research | VOL. 5
Ella Tetariy, et. al.Ella Tetariy ... Ruthi Alon-Lavi
03 Feb 2016
Artificial Intelligence Research | VOL. 5

Cross-lingual transfer learning during supervised training in low resource scenarios
Amit Das ... Mark Hasegawa-Johnson
-
Amit Das, et. al.Amit Das ... Mark Hasegawa-Johnson
06 Sep 2015
06 Sep 2015

Complex Event Recognition from Images with Few Training Examples
Unaiza Ahsan ... Irfan Essa
-
Unaiza Ahsan, et. al.Unaiza Ahsan ... Irfan Essa
01 Mar 2017
01 Mar 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Phonology-Augmented Statistical Framework for Machine Transliteration Using Limited Linguistic Resources

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing