Statistical Machine Translation Models Research Articles

Machine translation (MT) systems have been built using numerous different techniques for bridging the language barriers. These techniques are broadly categorized into approaches like Statistical Machine Translation (SMT) and Neural Machine Translation (NMT). End-to-end NMT systems significantly outperform SMT in translation quality on many language pairs, especially those with the adequate parallel corpus. We report comparative experiments on baseline MT systems for Assamese to other Indo-Aryan languages (in both translation directions) using the traditional Phrase-Based SMT as well as some more successful NMT architectures, namely basic sequence-to-sequence model with attention, Transformer, and finetuned Transformer. The results are evaluated using the most prominent and popular standard automatic metric BLEU (BiLingual Evaluation Understudy), as well as other well-known metrics for exploring the performance of different baseline MT systems, since this is the first such work involving Assamese. The evaluation scores are compared for SMT and NMT models for the effectiveness of bi-directional language pairs involving Assamese and other Indo-Aryan languages (Bangla, Gujarati, Hindi, Marathi, Odia, Sinhalese, and Urdu). The highest BLEU scores obtained are for Assamese to Sinhalese for SMT (35.63) and the Assamese to Bangla for NMT systems (seq2seq is 50.92, Transformer is 50.01, and finetuned Transformer is 50.19). We also try to relate the results with the language characteristics, distances, family trees, domains, data sizes, and sentence lengths. We find that the effect of the domain is the most important factor affecting the results for the given data domains and sizes. We compare our results with the only existing MT system for Assamese (Bing Translator) and also with pairs involving Hindi.

Read full abstract

With the increase of translation demand, the advancement of information technology, the development of linguistic theories and the progress of natural language understanding models in artificial intelligence research, machine translation has gradually gained worldwide attention. However, at present, machine translation research still has problems such as insufficient bilingual data and lack of effective feature representation, which affects the further improvement of key modules of machine translation such as word alignment, sequence adjustment and translation modelling. The effect of machine translation is still unsatisfactory. As a new machine learning method, deep neural network can automatically learn abstract feature representation and establish a complex mapping relationship between input and output signals, which provides a new idea for statistical machine translation research. Firstly, the multi-layer neural network and the undirected probability graph model are combined, and the similarity and context information of vocabulary are effectively utilized to model the word alignment more fully, and the word alignment model named NNWAM is constructed. Secondly, the low dimension will be used. The feature representation is combined with other features into a linearly ordered pre-ordering model to construct the pre-ordering model named NNPR. Finally, the word alignment model and the pre-ordering model are combined in the same deep neural network framework to form DNNAPM, a statistical machine translation model based on deep neural networks. The experimental results show that the statistical machine translation model based on deep neural network has better effect, faster convergence and better reliability than the comparison model algorithm.

Read full abstract

Statistical Machine Translation Models Research Articles

Related Topics

Articles published on Statistical Machine Translation Models

Efficient incremental training using a novel NMT-SMT hybrid framework for translation of low-resource languages.

Statistical machine translation for Indic languages

Automatic translation from English to Amazigh using transformer learning

On the use of statistical machine translation for suggesting variable names for decompiled code: The Pharo case

Malayalam Natural Language Processing: Challenges in Building a Phrase-Based Statistical Machine Translation System

Reordering of Source Side for a Factored English to Manipuri SMT System

Improved Unsupervised Statistical Machine Translation via Unsupervised Word Sense Disambiguation for a Low-Resource and Indic Languages

Is the Corpus Ready for Machine Translation? A Case Study with Python to Pseudo-Code Corpus.

Improving Neural Machine Translation by Transferring Knowledge from Syntactic Constituent Alignment Learning

Knowledge Distillation: A Method for Making Neural Machine Translation More Efficient

Low Resource Neural Machine Translation: Assamese to/from Other Indo-Aryan (Indic) Languages

Generating Chinese Classical Poems with Statistical Machine Translation Models

Hybrid System Combination Framework for Uyghur–Chinese Machine Translation

Improving Transformer‐Based Neural Machine Translation with Prior Alignments

Transliterating Nôm Scripts into Vietnamese National Scripts using Statistical Machine Translation

Analyzing Subword Techniques to Improve English to Sinhala Neural Machine Translation

Adaptation in Statistical Machine Translation for Low-resource Domains in English-Vietnamese Language

Improving neural machine translation through phrase-based soft forced decoding

Revisiting Back-Translation for Low-Resource Machine Translation Between Chinese and Vietnamese

Research on statistical machine translation model based on deep neural network

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Statistical Machine Translation Models Research Articles

Related Topics

Articles published on Statistical Machine Translation Models

Efficient incremental training using a novel NMT-SMT hybrid framework for translation of low-resource languages.

Statistical machine translation for Indic languages

Automatic translation from English to Amazigh using transformer learning

On the use of statistical machine translation for suggesting variable names for decompiled code: The Pharo case

Malayalam Natural Language Processing: Challenges in Building a Phrase-Based Statistical Machine Translation System

Reordering of Source Side for a Factored English to Manipuri SMT System

Improved Unsupervised Statistical Machine Translation via Unsupervised Word Sense Disambiguation for a Low-Resource and Indic Languages

Is the Corpus Ready for Machine Translation? A Case Study with Python to Pseudo-Code Corpus.

Improving Neural Machine Translation by Transferring Knowledge from Syntactic Constituent Alignment Learning

Knowledge Distillation: A Method for Making Neural Machine Translation More Efficient

Low Resource Neural Machine Translation: Assamese to/from Other Indo-Aryan (Indic) Languages

Generating Chinese Classical Poems with Statistical Machine Translation Models

Hybrid System Combination Framework for Uyghur–Chinese Machine Translation

Improving Transformer‐Based Neural Machine Translation with Prior Alignments

Transliterating Nôm Scripts into Vietnamese National Scripts using Statistical Machine Translation

Analyzing Subword Techniques to Improve English to Sinhala Neural Machine Translation

Adaptation in Statistical Machine Translation for Low-resource Domains in English-Vietnamese Language

Improving neural machine translation through phrase-based soft forced decoding

Revisiting Back-Translation for Low-Resource Machine Translation Between Chinese and Vietnamese

Research on statistical machine translation model based on deep neural network