Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Towards achieving a delicate blending between rule-based translator and neural machine translator

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Popular translators such as Google, Bing, etc., perform quite well when translating among the popular languages such as English, French, etc.; however, they make elementary mistakes when translating the low-resource languages such as Bengali, Arabic, etc. Google uses Neural Machine Translation (NMT) approach to build its multilingual translation system. Prior to NMT, Google used Statistical Machine Translation (SMT) approach. However, these approaches solely depend on the availability of a large parallel corpus of the translating language pairs. As a result, a good number of widely spoken languages such as Bengali, remain little explored in the research arena of artificial intelligence. Hence, the goal of this study is to explore improvized translation from Bengali to English. To do so, we study both the rule-based translator and the corpus-based machine translators (NMT and SMT) in isolation, and in combination with different approaches of blending between them. More specifically, first, we adopt popular corpus-based machine translators (NMT and SMT) and a rule-based machine translator for Bengali to English translation. Next, we integrate the rule-based translator with each of the corpus-based machine translators separately using different approaches. Besides, we perform rigorous experimentation over different datasets to report the best performance score for Bengali to English translation till today by revealing a comparison among the different approaches in terms of translation performance. Finally, we discuss how our different blending approaches can be re-used for other low-resource languages.

Similar Papers
  • Research Article
  • 10.5445/ir/1000104498
Multilingual Neural Translation
  • Feb 14, 2020
  • Repository KITopen (Karlsruhe Institute of Technology)
  • Thanh-Le Ha

Multilingual Neural Translation

  • Research Article
  • 10.25073/2588-1086/vnucsce.231
Adaptation in Statistical Machine Translation for Low-resource Domains in English-Vietnamese Language
  • May 30, 2020
  • VNU Journal of Science: Computer Science and Communication Engineering
  • Nghia-Luan Pham + 1 more

In this paper, we propose a new method for domain adaptation in Statistical Machine Translation for low-resource domains in English-Vietnamese language. Specifically, our method only uses monolingual data to adapt the translation phrase-table, our system brings improvements over the SMT baseline system. We propose two steps to improve the quality of SMT system: (i) classify phrases on the target side of the translation phrase-table use the probability classifier model, and (ii) adapt to the phrase-table translation by recomputing the direct translation probability of phrases.
 
 Our experiments are conducted with translation direction from English to Vietnamese on two very different domains that are legal domain (out-of-domain) and general domain (in-of-domain). The English-Vietnamese parallel corpus is provided by the IWSLT 2015 organizers and the experimental results showed that our method significantly outperformed the baseline system. Our system improved on the quality of machine translation in the legal domain up to 0.9 BLEU scores over the baseline system,…
 Keywords: 
 Machine Translation, Statistical Machine Translation, Domain Adaptation
 References
 [1] Philipp Koehn, Franz Josef Och, Daniel Marcu, Statistical phrase-based translation, In Proceedings of HLT-NAACL, Edmonton, Canada, 2003, 127-133.
 [2] Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes and Jeffrey Dean, Google’s neural machine translation system: Bridging the gap between human and machine translation, CoRR, abs/1609.08144, 2016.
 [3] Luisa Bentivogli, Arianna Bisazza, Mauro Cettolo and Marcello Federico, Neural versus phrase-based machine translation quality: A case study, 2016.
 [4] Barry Haddow, Philipp Koehn, Analysing the effect of out-of-domain data on smt systems, In Proceedings of the Seventh Workshop on Statistical Machine Translation, 2012, 422-432.
 [5] Boxing Chen, Roland Kuhn and George Foster, Vector space model for adaptation in statistical machine translation, Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 2013, pp. 1285-1293.
 [6] Daniel Dahlmeier, Hwee Tou Ng, Siew Mei Wu4, Building a large annotated corpus of learner english: The nus corpus of learner english, In Proceedings of the NAACL Workshop on Innovative Use of NLP for Building Educational Appli-cations, 2013.
 [7] Eva Hasler, Phil Blunsom, Philipp Koehn and Barry Haddow, Dynamic topic adaptation for phrase-based mt, In Proceedings of the 14th Conference of the European Chapter of The Association for Computational Linguistics, 2014, pp. 328-337.
 [8] George Foster, Roland Kuhn, Mixture-model adaptation for smt, Proceedings of the Second Workshop on Statistical Machine Translation, Prague, Association for Computational Linguistics, 2007, pp. 128-135.
 [9] George Foster, Boxing Chen, Roland Kuhn, Simulating discriminative training for linear mixture adaptation in statistical machine translation, Proceedings of the MT Summit, 2013.
 [10] Hoang Cuong, Khalil Sima’an, and Ivan Titov, Adapting to all domains at once: Rewarding domain invariance in smt, Proceedings of the Transactions of the Association for Computational Linguistics (TACL), 2016.
 [11] Ryo Masumura, Taichi Asam, Takanobu Oba, Hirokazu Masataki, Sumitaka Sakauchi, and Akinori Ito, Hierarchical latent words language models for robust modeling to out-of domain tasks, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015, pp. 1896-1901.
 [12] Chenhui Chu, Raj Dabre, and Sadao Kurohashi. An empirical comparison of simple domain adaptation methods for neural machine translation, 2017.
 [13] Markus Freitag, Yaser Al-Onaizan, Fast domain adaptation for neural machine translation, 2016.
 [14] Jia Xu, Yonggang Deng, Yuqing Gao and Hermann Ney, Domain dependent statistical machine translation, In Proceedings of the MT Summit XI, 2007, pp. 515-520.
 [15] Hua Wu, Haifeng Wang Chengqing Zong, Domain adaptation for statistical machine translation with domain dictionary and monolingual corpora, In Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), Manchester, UK, 2008, pp. 993-1000.
 [16] Adam Berger, Stephen Della Pietra, and Vincent Della Pietra, A maximum entropy approach to natural language processing, Computational Linguistics, 22, 1996.
 [17] 18Santanu Pal, Sudip Naskar, Josef Van Genabith, Uds-sant, English-German hybrid machine translation system, In Proceedings of the Tenth Workshop on Statistical Machine Translation, Lisbon, Portugal, September, Association for Computational Linguistics, 2015, pp. 152-157.
 [18] Louis Onrust, Antal van den Bosch, Hugo Van hamme, Improving cross-domain n-gram language modelling with skipgrams, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, 2016, pp. 137-142.
 [19] Mark Aronoff, Kirsten Fudeman, What is morphology, V 8. john wiley and sons, 2011.
 [20] Laurence C. Thompson, The problem of the word in vietnamese, In journal of the International Linguistic Association 19(1) (1963) 39-52. https:// doi.org/1080/00437956.1963.11659787.
 [21] Binh N. Ngo, The Vietnamese language learning framework, Journal of Southeast Asian Language Teaching 10 (2001) 1-24.
 [22] Le Hong Phuong, Nguyen Thi Minh Huyen, Azim Roussanaly, Ho Tuong Vinh, A hybrid approach to word segmentation of vietnamese texts, 2008.
 [23] Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondřej Bojar, Alexandra Constantin, Evan Herbst, Moses: Open source toolkit for statistical machine translation, In ACL-2007: Proceedings of demo and poster sessions, Prague, Czech Republic, 2007, pp.177-180.
 [24] Franz Josef Och, Minimum error rate training in statistical machine translation, In Proceedings of ACL, 2003, pp.160-167.
 [25] Andreas Stolcke, Srilm - an extensible language modeling toolkit, in proceedings of international conference on spoken language processing, 2002.
 [26] Papineni, Kishore, Salim Roukos, Todd Ward, WeiJing Zhu, Bleu: A method for automatic evaluation of machine translation, ACL, 2002.
 [27] G. Klein, Y. Kim, Y. Deng, J. Senellart, A.M. Rush, OpenNMT: Open-Source Toolkit for Neural Machine Translation. ArXiv e-prints.
 [28] Pratyush Banerjee, Jinhua Du, Baoli Li, Sudip Kr. Naskar, Andy Way and Josef van Genabith, Combining multi-domain statistical machine translation models using automatic classifiers, In Proceedings of AMTA 2010., 2010.

  • Research Article
  • Cite Count Icon 6
  • 10.18517/ijaseit.8.4-2.6816
Hybrid Machine Translation with Multi-Source Encoder-Decoder Long Short-Term Memory in English-Malay Translation
  • Sep 26, 2018
  • International Journal on Advanced Science, Engineering and Information Technology
  • Yin-Lai Yeong + 3 more

<p class='IJASEITAbtract'>Statistical Machine Translation (SMT) and Neural Machine Translation (NMT) are the state-of-the-art approaches in machine translation (MT). The translation produced by a SMT is based on the statistical analysis of text corpora, while NMT uses deep neural network to model and to generate a translation. SMT and NMT have their strength and weaknesses. SMT may produce better translation with a small parallel text corpus compared to NMT. Nevertheless, when the amount of parallel text available is large, the quality of the translation produced by NMT is often higher than SMT. Besides that, study also shown that the translation produced by SMT is better than NMT in cases where there is a domain mismatch between training and testing. SMT also has an advantage on long sentences. In addition, when a translation produced by an NMT is wrong, it is very difficult to find the error. In this paper, we investigate a hybrid approach that combine SMT and NMT to perform English to Malay translation. The motivation of using a hybrid machine translation is to combine the strength of both approaches to produce a more accurate translation. Our approach uses the multi-source encoder-decoder long short-term memory (LSTM) architecture. The architecture uses two encoders, one to embed the sentence to be translated, and another encoder to embed the initial translation produced by SMT. The translation from the SMT can be viewed as a “suggestion translation” to the neural MT. Our experiments show that the hybrid MT increases the BLEU scores of our best baseline machine translation in computer science domain and news domain from 21.21 and 48.35 to 35.97 and 61.81 respectively.

  • Research Article
  • Cite Count Icon 9
  • 10.1145/3610582
A Pragmatic Analysis of Machine Translation Techniques for Preserving the Authenticity of the Sanskrit Language
  • Jul 25, 2023
  • ACM Transactions on Asian and Low-Resource Language Information Processing
  • Nandini Sethi + 4 more

Machine Translation has been a field of study for over six decades, but it has acquired substantial prominence in the last decade as processing capacity in personal computers has increased. The purpose of this paper is to discuss the usage of Sanskrit as a source, target, or supporting language in various Machine Translation systems. To investigate Machine Translation, researchers use a variety of strategies, including corpus-based, direct, and rule-based approaches. The primary goal of employing Sanskrit in Machine Translation is to evaluate its appropriateness, lexicon, and performance when proper Machine Translation methods are used. The research examines various modelling strategies for developing a machine translation system, specifically Statistical and Neural Machine Translation, in order to bridge the gap between Sanskrit and its current successor, Hindi. Interpretations are formed in Statistical Machine Translation by matching words from the source and target languages with statistical models and bilingual text corpora to learn parameters. Neural Machine Translation, on the other hand, uses an artificial neural network to predict the likelihood of a word sequence, frequently modelling entire phrases within a single integrated model. Neural Machine Translation is implemented using an encoder-decoder architecture with an attention mechanism. One of the most significant contributions of this paper is the use of different data sources, data collecting, and scraping to create a complete dataset. According to the study's findings, Neural Machine Translation outperforms the Statistical Machine Translation modelling technique. Furthermore, the paper examines the distinctive qualities of the Sanskrit language as well as the difficulties encountered by researchers in digesting Sanskrit while constructing the machine translation system. This study investigates the use of Sanskrit in Machine Translation and analyses several modelling methods, such as Statistical and Neural Machine Translation. The paper emphasizes the advantages of Neural Machine Translation and discusses the unique characteristics and challenges of the Sanskrit language in machine translation development.

  • Conference Article
  • Cite Count Icon 16
  • 10.1109/compe49325.2020.9200059
Low Resource and Domain Specific English to Khasi SMT and NMT Systems
  • Jul 1, 2020
  • Thoudam Doren Singh + 1 more

Machine translation systems for low resource languages face challenges in terms of quality and understanding. Our work focus on the translations for English to Khasi using two epitomes of translations using statistical and neural machine translation approaches. As part of this system development, we built an English-Khasi parallel dataset from existing domain specific literature. The quality of translations of statistical machine translation (SMT) and neural machine translation (NMT) systems for low resource and domain specific setting are substantially analyzed considering automatic and subjective evaluation techniques.

  • Conference Article
  • Cite Count Icon 4
  • 10.24963/ijcai.2018/789
From Feature to Paradigm: Deep Learning in Machine Translation (Extended Abstract)
  • Jul 1, 2018
  • Marta R Costa-Jussà

In the last years, deep learning algorithms have highly revolutionized several areas including speech, image and natural language processing. The specific field of Machine Translation (MT) has not remained invariant. Integration of deep learning in MT varies from re-modeling existing features into standard statistical systems to the development of a new architecture. Among the different neural networks, research works use feed-forward neural networks, recurrent neural networks and the encoder-decoder schema. These architectures are able to tackle challenges as having low-resources or morphology variations. This extended abstract focuses on describing the foundational works on the neural MT approach; mentioning its strengths and weaknesses; and including an analysis of the corresponding challenges and future work. The full manuscript [Costa-jussà, 2018] describes, in addition, how these neural networks have been integrated to enhance different aspects and models from statistical MT, including language modeling, word alignment, translation, reordering, and rescoring; and on describing the new neural MT approach together with recent approaches on using subword, characters and training with multilingual languages, among others.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 89
  • 10.3389/fdigh.2018.00009
Post-editing Effort of a Novel With Statistical and Neural Machine Translation
  • May 15, 2018
  • Frontiers in Digital Humanities
  • Antonio Toral + 2 more

We conduct the first experiment in the literature in which a novel is translated automatically and then post-edited by professional literary translators. Our case study is Warbreaker, a popular fantasy novel originally written in English, which we translate into Catalan. We translated one chapter of the novel (over 3,700 words, 330 sentences) with two data-driven approaches to Machine Translation (MT): phrase-based statistical MT (PBMT) and neural MT (NMT). Both systems are tailored to novels; they are trained on over 100 million words of fiction. In the post-editing experiment, six professional translators with previous experience in literary translation translate subsets of this chapter under three alternating conditions: from scratch (the norm in the novel translation industry), post-editing PBMT, and post-editing NMT. We record all the keystrokes, the time taken to translate each sentence, as well as the number of pauses and their duration. Based on these measurements, and using mixed-effects models, we study post-editing effort across its three commonly studied dimensions: temporal, technical and cognitive. We observe that both MT approaches result in increases in translation productivity: PBMT by 18%, and NMT by 36%. Post-editing also leads to reductions in the number of keystrokes: by 9% with PBMT, and by 23% with NMT. Finally, regarding cognitive effort, post-editing results in fewer (29 and 42% less with PBMT and NMT, respectively) but longer pauses (14 and 25%).

  • Research Article
  • Cite Count Icon 14
  • 10.1145/3591207
A Novel Neural Machine Translation Approach for low-resource Sanskrit-Hindi Language pair
  • Apr 8, 2023
  • ACM Transactions on Asian and Low-Resource Language Information Processing
  • Nandini Sethi + 2 more

Sanskrit is one of the earliest native languages and is correctly described as "the gods' language" because of its wide use in Indian religious literature from the past. However, it is becoming less popular in modern India. Due in significant part to the need for more materials for translation both in and out of Sanskrit, it is no longer commonly utilized. This study explores the feasibility of using machine translation (MT) to provide a link between Sanskrit and, one of the earliest native languages, and its contemporary descendant Hindi. A study was conducted between existing modelling methodologies, notably Statistical machine translation (SMT), and the proposed novel deep learning-based Machine translation strategy using a manually created parallel corpus for the Sanskrit-Hindi language pair. While SMT creates interpretations by mapping phrases from the languages of the source and destination, statistical models, and bilingual text corpora for learning parameters, neural machine translation (NMT) frequently models entire phrases in a single integrated model, using a convolutional neural network to calculate the probability of a word sequence. The proposed NMT model is implemented using an encoder-decoder with an attention mechanism paradigm and the inclusion of gated recurrent units. Our approach involved development of a novel model for Sanskrit-Hindi machine translation using deep learning and the creation of parallel corpora for the Sanskrit-Hindi language pair. The proposed model is evaluated on automated and human-based metrics, and results show that our proposed deep learning-based model outperforms statistical modelling techniques on Moses, surpassing them both with a BLEU score of 53.8% compared to 34.56%. This article examines the undiscovered area of machine translation from Sanskrit to Hindi and discusses the main benefits and drawbacks of statistical and neural machine translation while providing a fresh viewpoint on the subject.

  • Conference Article
  • Cite Count Icon 9
  • 10.1109/iccct53315.2021.9711807
Machine Learning Approach to English-Afaan Oromo Text-Text Translation: Using Attention based Neural Machine Translation
  • Dec 16, 2021
  • Ebisa A Gemechu + 1 more

In this paper, we present a Neural Machine Translation (NMT) approach for English-Afaan Oromo text translation. NMT is a machine translation technique that applies an artificial neural network to predict the probability of a sequence of words. It is enhanced by Recurrent Neural Networks (RNN), also called the encoder-decoder networks. There have been some noticeable challenges with English-Afaan Oromo machine translation using the conventional rule-based and Statistical Machine Translation (SMT) systems. Text alignment was the main challenge since there are differences in the structure between the two languages. Afaan Oromo is a morphologically rich language, which makes it difficult to develop rules for the syntactic and semantic elements. Our proposal is to develop a machine translation model for English-Afaan Oromo text, using the attention-based NMT technique. This technique overcomes most of the text alignment and rule tagging limitations. The train/test split technique is used to train and test our model. Adam algorithm is adopted for our training optimizations. We used the BLEU score and human rating Likert scale evaluation methods for evaluation. Experiment shows that our model is significantly important in translation with an average BLEU point of 41.62 on the test sets. This result outperforms the previous baseline systems experimented on English-Afaan Oromo.

  • Conference Article
  • Cite Count Icon 11
  • 10.1109/icbslp47725.2019.201502
Neural vs Statistical Machine Translation: Revisiting the Bangla-English Language Pair
  • Sep 1, 2019
  • Md Arid Hasan + 3 more

Machine translation systems facilitate our communication and access to information, taking down language barriers. It is a well-researched area of Natural Language Processing (NLP), especially for resource-rich languages (e.g., language pairs in Europarl Parallel corpus). Besides these languages, there is also work on other language pairs including the Bangla-English language pair. In the current study, we aim to revisit both Statistical Machine Translation (SMT) and Neural Machine Translation (NMT) approaches using well-known, publicly available corpora for the Bangla-English (Bangla to English) language pair. We reported how the performance of the models differ based on the data and modeling techniques; consequently, we also compared the results obtained with Google’s machine translation system. Our findings, across different corpora, indicates that NMT based approaches outperform SMT systems. Our results also outperform existing baselines by a large margin.

  • Conference Article
  • Cite Count Icon 8
  • 10.1109/icter51097.2020.9325431
A Comparison of Transformer, Recurrent Neural Networks and SMT in Tamil to Sinhala MT
  • Nov 4, 2020
  • Ashmari Pramodya + 2 more

Neural Machine Translation (NMT) is currently the most promising approach for machine translation. The attention mechanism is a successful technique in modern Natural Language Processing (NLP), especially in tasks like machine translation. The recently proposed network architecture of the Transformer is based entirely on attention mechanisms and achieves a new state of the art results in neural machine translation, outperforming other sequence-to-sequence models. Although it is successful in a resource-rich setting, its applicability for low-resource language pairs is still debatable. Additionally when the language pair is morphologically rich and also when the corpora is multi-domain, the lack of a large parallel corpus becomes a significant barrier. In this study, we explore different NMT algorithms - Long Short Term Memory (LSTM) and Transformer based NMT, to translate the Tamil to Sinhala language pair. Where we clearly see transformer outperforms LSTM by 2.43 BLEU score for Tamil to Sinhala direction. And this work provides a preliminary comparison of statistical machine translation (SMT) and Neural Machine Translation (NMT) for Tamil to Sinhala in the open domain context.

  • PDF Download Icon
  • Conference Article
  • Cite Count Icon 20
  • 10.18653/v1/p18-1116
Forest-Based Neural Machine Translation
  • Jan 1, 2018
  • Chunpeng Ma + 4 more

Tree-based neural machine translation (NMT) approaches, although achieved impressive performance, suffer from a major drawback: they only use the 1-best parse tree to direct the translation, which potentially introduces translation mistakes due to parsing errors. For statistical machine translation (SMT), forest-based methods have been proven to be effective for solving this problem, while for NMT this kind of approach has not been attempted. This paper proposes a forest-based NMT method that translates a linearized packed forest under a simple sequence-to-sequence framework (i.e., a forest-to-sequence NMT model). The BLEU score of the proposed method is higher than that of the sequence-to-sequence NMT, tree-based NMT, and forest-based SMT systems.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 12
  • 10.1007/s10590-021-09266-0
An in-depth analysis of the individual impact of controlled language rules on machine translation output: a mixed-methods approach
  • Jun 1, 2021
  • Machine Translation
  • Shaimaa Marzouk

Examining the general impact of Controlled Language (CL) rules in the context of Machine Translation (MT) has been an area of research for many years. The present study focuses on the following question: how do CL rules impact MT output individually? By analysing a German corpus-based test suite of technical texts that have been translated into English by different MT systems, this study endeavours to answer this question at different levels: the general impact of CL rules (rule- and system-independent), their impact at rule level (system-independent) as well as at rule and system level. The results of five MT systems are analysed and contrasted: a rule-based system, a statistical system, two differently constructed hybrid systems, and a neural system. For this, a mixed-methods triangulation approach that includes error annotation, human evaluation, and automatic evaluation was applied. The data was analysed both qualitatively and quantitatively in terms of CL influence on the following parameters: number and type of MT errors, style and content quality, and scores of two automatic evaluation metrics. In line with many studies, the results show a general positive impact of the applied CL rules on the MT output. However, at rule level, only four rules proved to have positive effects on the aforementioned parameters; three rules had negative effects on the parameters; and two rules did not show any significant impact. At rule and system level, the rules affected the MT systems differently, as expected. Rules that had a positive impact on earlier MT approaches did not show the same impact on the neural MT approach. Furthermore, neural MT delivered distinctly better results than earlier MT approaches, namely the highest error-free, style and content quality rates both before and after applying the rules, which indicates that neural MT offers a promising solution that no longer requires CL rules for improving the MT output.

  • Research Article
  • 10.3844/jcssp.2025.3041.3050
Evaluating Machine Translation for Domain Specific Low-Resource Nepali-English Language Pairs: The Impact of Tokenization on Statistical and Neural Techniques
  • Dec 1, 2025
  • Journal of Computer Science
  • Amit Kumar Roy + 1 more

In the modern era, the field of Machine Translation (MT) has seen a significant shift towards Neural Machine Translation (NMT) techniques, which have surpassed traditional Statistical Machine Translation (SMT) models in terms of the quality of translation. Despite this, the efficacy of these techniques may differ based on the language combination in consideration. While SMT is somewhat more flexible in this regard, NMT often needs sizable parallel corpora to attain high translation accuracy. As a result, a benchmark system capable of offering sufficient translation for languages with limited resources, like Nepali, remains a pipe dream. This paper focuses on translating text using statistical and neural MT techniques for the under-resourced English-Nepali language pair. As a part of this system development, we built a parallel corpus of English-Nepali in the tourism domain. We explore the impact of different tokenization techniques on translation outcomes. A substantial analysis is also done for the performance of both approaches using automatic evaluation metrics, BLEU and TER. This paper aims to provide insights into the applicability of SMT and NMT for the under-resourced English-Nepali language pair in light of two popular epitomes of tokenization and to determine the most effective approach for achieving accurate translations.

  • Book Chapter
  • Cite Count Icon 2
  • 10.1007/978-981-19-9422-7_12
Rendering Morphosyntactic Features of Legal Spanish Judgments Using Neural and Statistical Machine Translation
  • Jan 1, 2023
  • Jeffrey Killman

Addressing machine translation (MT) in the legal context, this chapter compares Spanish-to-English neural MT (NMT) and statistical MT (SMT) output from the same MT provider during these different paradigm periods. The chapter focuses on translation renditions of various morphosyntactic features originating from a morpho-syntactically complex text of judgment summaries issued by the Supreme Court of Spain. One the one hand, the chapter evaluates NMT and SMT translation solutions of frequent morphosyntactic features revealed by means of corpus analysis software in the areas of verb and subject order and active and passive voice. On the other, a set of complex sentences is analysed to reveal how NMT and SMT translations of these and other morphosyntactic features might fare under more dispersed or broader contextual conditions. NMT appears to provide more adequate solutions in the case of most of the individual morphosyntactic features analysed, as well as a tendency to be more consistently reliable from a grammatical equivalence perspective. SMT, however, may provide more peculiar or contextually desirable solutions in certain cases, but these solutions cannot be relied upon as much as the syntactically thorough or grammatically equivalent approaches provided by NMT on a more consistent basis.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant