Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Referential explicitation in translations by large language models, neural machine translation system, and human translators across genres

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Abstract This study examines referential explicitation in Large Language Models (LLMs)-based machine translation (MT) compared to neural machine translation (NMT) and human translations. Referential explicitation—the process of making implicit referential expressions explicit in the target text, has been recognized as a crucial aspect of translation studies. Through Multidimensional Analysis (MDA), findings indicate that all translation modes perform referential explicitation across various genres including news, academic, and fiction. The degree of explicitation in LLM-based machine translation falls between human translations and neural machine translations. Moreover, machine translations prioritize clarity and transparency, whereas human translations tend to preserve implicit references especially in fiction. This difference underscores the challenge for neural machine translations in balancing clarity with cultural nuances. Notably, LLM-based machine translations demonstrate improvements, achieving performance closer to human translators due to their distinct underlying logic compared to neural machine translation. Additionally, genre-specific patterns reveal consistently high levels of explicitation in news and academic texts, while fiction translations vary in their preservation of implicit references. This study enhances the theoretical understanding of translation universals and contributes to the development of advanced evaluation metrics for machine translation.

Similar Papers
  • Dissertation
  • 10.11606/d.8.2024.tde-10122024-105745
Decoding spatial semantics: a comparative analysis of the performance of open-source LLMs against NMT systems in translating EN-PT-BR subtitles
  • Aug 6, 2024
  • Rafael Macário Fernandes

This master\\'s thesis investigates the challenges of translating spatial language using open-source Large Language Models (LLMs) compared to traditional Neural Machine Translation (NMT) systems. It addresses the problem of accurately translating the semantics of spatial prepositions such as ACROSS, INTO, ONTO, and THROUGH, which are often translated into similar verbal or prepositional forms from English to Portuguese (EN-PT-BR). Correctly translating these prepositions is crucial for maintaining the semantic integrity of the source content while ensuring fluency and adherence to the lexicalization patterns of the target language (House 2018; Talmy 2000b; Slobin 2005). The research begins by contextualizing the challenges of spatial language translation, highlighting the limitations of current NMT systems and the potential advantages of LLMs. A comprehensive literature review traces the evolution of translation theories, the development of NMT, and the rise of LLMs, while also describing the potential limitations of the current approach. The methodology employs a corpus-based analysis, assembling a bilingual dataset centered on spatial prepositions comprising TED Talks subtitles from the OPUS platform. This dataset was meticulously pre-processed to facilitate both automated metrics and manual error analysis. The evaluation metrics used include BLEU, METEOR, BERTScore, COMET, and TER, while the manual error analysis specifically identifies and categorizes the types of errors each system makes. The findings reveal that moderate-sized LLMs such as LLaMa-3-8B and Mixtral-8x7B achieve accuracy close to NMT systems such as Google, although this relationship is not always linear, as models like Gemma-7B presented similar performance in human reviews. However, LLMs generally presented other serious mistranslation errors, including interlanguage/code-switching (in) and anglicism (an) errors, failing to convey idiomacity in the target language. Conversely, NMT systems achieved better general fluency and precision for machine translation tasks. Manual error analysis, on the other hand, underscores the ongoing challenges both LLMs and NMT systems face in translating the nuances of spatial language, with both groups presenting consistent numbers of errors like polysemy (po) and syntactic projection (sp) errors, where they either fail to translate a preposition\\'s appropriate meaning or copy the lexicalization patterns from the source text into the target text (Fernandes et al. 2024; Oliveira and Fernandes 2022). The master\\'s thesis concludes that despite the advancements in LLMs, significant hurdles remain in translating spatial language accurately. It suggests that future research should focus on enhancing training datasets, refining model architectures, and developing more sophisticated evaluation metrics that better capture the semantic subtleties of spatial language. This study contributes to the field by providing a detailed comparison of model performance in spatial language translation from EN-PT-BR and proposing directions for future improvements

  • Research Article
  • Cite Count Icon 1
  • 10.22363/2521-442x-2025-9-1-10-27
Mind vs. machine: Comparative analysis of metaphor-related word translation by human and AI systems
  • Mar 24, 2025
  • Training, Language and Culture
  • Zhengjian Li + 1 more

The present study presents a comparative analysis of the translation processes and outcomes of human translators, Neural Machine Translation (NMT) systems and Large Language Models (LLMs) focusing on the translation of Metaphor-related Words (MRW). The study employs various research methodologies, including product analysis, think-aloud protocols, subsequent interviews, and translation quality assessments to uncover the choice of strategies in translating MRWs by different subject groups as well as its relation with quality criterion. Human translators and LLMs tend to favour strategies such as metaphor into different metaphor (M-M2) and metaphor reduction (M→Non), while NMT systems prefer the reproduction of metaphors (M→M). LLMs demonstrate translation patterns which are more aligned with human translators, helping them achieve higher evaluation scores, though their performance remains inconsistent, particularly with novel metaphors. Additionally, human translators process metaphors by incorporating conceptual, cultural, and contextual factors, whereas LLMs tend to rely on paraphrastic approaches. Evaluation results indicate that LLMs exhibit proficiency on par with novice translators in terms of accuracy, idiomatic expression, and vividness in metaphor translation, while NMT systems fall slightly short. The study highlights the influence of translation strategies on the quality of metaphor translation and concludes that, while NMT systems and LLMs can achieve performance comparable to human translators, much larger metaphor-specific datasets supported studies are expected to validate its consistency.

  • Research Article
  • Cite Count Icon 1
  • 10.11648/j.ijalt.20251104.12
A Comparative Study on the Translation Quality of Chinese Diplomatic Discourse by NMT and LLMs Based on Multidimensional Quality Metrics
  • Oct 27, 2025
  • International Journal of Applied Linguistics and Translation
  • Dong Lu

Chinese diplomatic discourse plays a crucial role in articulating China’s position and enhancing its influence in global forums. However, machine translation (MT) often struggles with culturally nuanced and abstract expressions, highlighting the need to compare various advanced MT tools. This study assesses and compares the translation quality of Neural Machine Translation (NMT) systems and Large Language Models (LLMs) in translating Chinese diplomatic texts, focusing on the 2025 China-US tariff statements by China’s Foreign Ministry Spokesperson Lin Jian, with <i>China Daily</i>’s official English versions serving as references. Four NMT tools (Niutrans, Youdao, Google, DeepL) and four LLMs (DeepSeek, Ernie-4.5, ChatGPT-4.0, Gemini) were examined. Using the Multidimensional Quality Metrics (MQM) framework, the study evaluated translations, especially for phrases like “奉陪到底” (fight to the end) and “得道多助,失道寡助” (A just cause enjoys abundant support while an unjust one finds little). Results show that LLMs outperform NMTs: 50% of LLMs (DeepSeek, Ernie-4.5) accurately translated both phrases, while only 25% of NMTs (Google) did so for “奉陪到底,” and none for “得道多助,失道寡助.” Both systems faced issues such as undertranslation, omission, and a lack of diplomatic formality. The findings suggest that LLMs have greater potential to handle cultural nuances and abstract content in diplomatic texts, providing insights for enhancing domain-specific MT training and striking a balance between accuracy and acceptability in conveying Chinese diplomatic messages.

  • Research Article
  • 10.59720/24-020
Large Language Models are Good Translators
  • Jan 1, 2024
  • Journal of Emerging Investigators
  • Zhaohan Zeng + 1 more

Machine translation, which uses computers to translate one language into another, is one of the most challenging tasks in artificial intelligence. During the last decade, neural machine translation (NMT), which builds translation models based on deep neural networks, has achieved significant improvement. However, NMT still faces several challenges. For example, the translation quality of an NMT system greatly depends on the amount of bilingual training data, which is expensive to acquire. Furthermore, it is difficult to incorporate external knowledge into an NMT system to obtain further improvement for a specific domain. Recently, large language models (LLMs) have demonstrated remarkable capabilities in language understanding and generation. This raises interesting questions about whether LLMs can be good translators and whether it is easy to adapt LLMs to new domains or to meet specific requirements. In this study, we hypothesized that LLMs can be adapted to perform translation by using prompts or fine-tuning and these adapted LLMs would outperform the conventional NMT model in four aspects: translation quality, interactive ability, knowledge incorporation ability, and domain adaptation. We compared GPT-4 and Google Translate, the representative LLM and NMT models, respectively, on the WMT 2019 (Fourth conference on machine translation) dataset. Experimental results showed that GPT-4 outperformed Google Translate in the above four aspects by exploiting appropriate prompts. Further experiments on Llama, an open-source LLM developed by Meta, showed that the translation quality of LLMs can be further improved by fine-tuning on limited language-related bilingual corpus, demonstrating strong adaptation abilities of LLMs.

  • Research Article
  • Cite Count Icon 8
  • 10.58557/(ijeh).v4i2.213
Comparison of Translation Quality between Large Language Models and Neural Machine Translation Systems: A Case Study of Chinese-English Language Pair
  • Apr 17, 2024
  • International Journal of Education and Humanities
  • Xinchen Li

A number of Neural Machine Translation (NMT) systems have already demonstrated their strength to undertake various translation tasks which are not too demanding. However, the incredible advancement of AI technology in recent years has endowed Large Language Models (LLMs) with great potential, so we may imagine that they may even do better than NMT working as translators. To figure out whether LLMs have better performance than NMT in translation, and how genres and translation directions may influence translation quality, this article chose two LLMs, namely, ChatGPT 3.5 and Wenxin Yiyan, or ERNIE Bot 3.5, and one NMT system, namely, DeepL, to test and compare their performance in Chinese-English translation, employing a quantitative method including BLEU scoring and SPSS analysis. The results show that there is no significant improvement in these LLMs’ translation quality compared with the NMT system, and all the chosen systems tend to perform better in non-literary translation than in literary translation and produce TTs of higher quality in Chinese-English translation than in English-Chinese translation

  • Research Article
  • 10.25073/2588-1086/vnucsce.231
Adaptation in Statistical Machine Translation for Low-resource Domains in English-Vietnamese Language
  • May 30, 2020
  • VNU Journal of Science: Computer Science and Communication Engineering
  • Nghia-Luan Pham + 1 more

In this paper, we propose a new method for domain adaptation in Statistical Machine Translation for low-resource domains in English-Vietnamese language. Specifically, our method only uses monolingual data to adapt the translation phrase-table, our system brings improvements over the SMT baseline system. We propose two steps to improve the quality of SMT system: (i) classify phrases on the target side of the translation phrase-table use the probability classifier model, and (ii) adapt to the phrase-table translation by recomputing the direct translation probability of phrases.
 
 Our experiments are conducted with translation direction from English to Vietnamese on two very different domains that are legal domain (out-of-domain) and general domain (in-of-domain). The English-Vietnamese parallel corpus is provided by the IWSLT 2015 organizers and the experimental results showed that our method significantly outperformed the baseline system. Our system improved on the quality of machine translation in the legal domain up to 0.9 BLEU scores over the baseline system,…
 Keywords: 
 Machine Translation, Statistical Machine Translation, Domain Adaptation
 References
 [1] Philipp Koehn, Franz Josef Och, Daniel Marcu, Statistical phrase-based translation, In Proceedings of HLT-NAACL, Edmonton, Canada, 2003, 127-133.
 [2] Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes and Jeffrey Dean, Google’s neural machine translation system: Bridging the gap between human and machine translation, CoRR, abs/1609.08144, 2016.
 [3] Luisa Bentivogli, Arianna Bisazza, Mauro Cettolo and Marcello Federico, Neural versus phrase-based machine translation quality: A case study, 2016.
 [4] Barry Haddow, Philipp Koehn, Analysing the effect of out-of-domain data on smt systems, In Proceedings of the Seventh Workshop on Statistical Machine Translation, 2012, 422-432.
 [5] Boxing Chen, Roland Kuhn and George Foster, Vector space model for adaptation in statistical machine translation, Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 2013, pp. 1285-1293.
 [6] Daniel Dahlmeier, Hwee Tou Ng, Siew Mei Wu4, Building a large annotated corpus of learner english: The nus corpus of learner english, In Proceedings of the NAACL Workshop on Innovative Use of NLP for Building Educational Appli-cations, 2013.
 [7] Eva Hasler, Phil Blunsom, Philipp Koehn and Barry Haddow, Dynamic topic adaptation for phrase-based mt, In Proceedings of the 14th Conference of the European Chapter of The Association for Computational Linguistics, 2014, pp. 328-337.
 [8] George Foster, Roland Kuhn, Mixture-model adaptation for smt, Proceedings of the Second Workshop on Statistical Machine Translation, Prague, Association for Computational Linguistics, 2007, pp. 128-135.
 [9] George Foster, Boxing Chen, Roland Kuhn, Simulating discriminative training for linear mixture adaptation in statistical machine translation, Proceedings of the MT Summit, 2013.
 [10] Hoang Cuong, Khalil Sima’an, and Ivan Titov, Adapting to all domains at once: Rewarding domain invariance in smt, Proceedings of the Transactions of the Association for Computational Linguistics (TACL), 2016.
 [11] Ryo Masumura, Taichi Asam, Takanobu Oba, Hirokazu Masataki, Sumitaka Sakauchi, and Akinori Ito, Hierarchical latent words language models for robust modeling to out-of domain tasks, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015, pp. 1896-1901.
 [12] Chenhui Chu, Raj Dabre, and Sadao Kurohashi. An empirical comparison of simple domain adaptation methods for neural machine translation, 2017.
 [13] Markus Freitag, Yaser Al-Onaizan, Fast domain adaptation for neural machine translation, 2016.
 [14] Jia Xu, Yonggang Deng, Yuqing Gao and Hermann Ney, Domain dependent statistical machine translation, In Proceedings of the MT Summit XI, 2007, pp. 515-520.
 [15] Hua Wu, Haifeng Wang Chengqing Zong, Domain adaptation for statistical machine translation with domain dictionary and monolingual corpora, In Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), Manchester, UK, 2008, pp. 993-1000.
 [16] Adam Berger, Stephen Della Pietra, and Vincent Della Pietra, A maximum entropy approach to natural language processing, Computational Linguistics, 22, 1996.
 [17] 18Santanu Pal, Sudip Naskar, Josef Van Genabith, Uds-sant, English-German hybrid machine translation system, In Proceedings of the Tenth Workshop on Statistical Machine Translation, Lisbon, Portugal, September, Association for Computational Linguistics, 2015, pp. 152-157.
 [18] Louis Onrust, Antal van den Bosch, Hugo Van hamme, Improving cross-domain n-gram language modelling with skipgrams, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, 2016, pp. 137-142.
 [19] Mark Aronoff, Kirsten Fudeman, What is morphology, V 8. john wiley and sons, 2011.
 [20] Laurence C. Thompson, The problem of the word in vietnamese, In journal of the International Linguistic Association 19(1) (1963) 39-52. https:// doi.org/1080/00437956.1963.11659787.
 [21] Binh N. Ngo, The Vietnamese language learning framework, Journal of Southeast Asian Language Teaching 10 (2001) 1-24.
 [22] Le Hong Phuong, Nguyen Thi Minh Huyen, Azim Roussanaly, Ho Tuong Vinh, A hybrid approach to word segmentation of vietnamese texts, 2008.
 [23] Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondřej Bojar, Alexandra Constantin, Evan Herbst, Moses: Open source toolkit for statistical machine translation, In ACL-2007: Proceedings of demo and poster sessions, Prague, Czech Republic, 2007, pp.177-180.
 [24] Franz Josef Och, Minimum error rate training in statistical machine translation, In Proceedings of ACL, 2003, pp.160-167.
 [25] Andreas Stolcke, Srilm - an extensible language modeling toolkit, in proceedings of international conference on spoken language processing, 2002.
 [26] Papineni, Kishore, Salim Roukos, Todd Ward, WeiJing Zhu, Bleu: A method for automatic evaluation of machine translation, ACL, 2002.
 [27] G. Klein, Y. Kim, Y. Deng, J. Senellart, A.M. Rush, OpenNMT: Open-Source Toolkit for Neural Machine Translation. ArXiv e-prints.
 [28] Pratyush Banerjee, Jinhua Du, Baoli Li, Sudip Kr. Naskar, Andy Way and Josef van Genabith, Combining multi-domain statistical machine translation models using automatic classifiers, In Proceedings of AMTA 2010., 2010.

  • Research Article
  • Cite Count Icon 1
  • 10.1057/s41599-026-06630-4
Exploring AI’s performance in literary autobiography translation: how closely do AI models match human translation
  • Mar 7, 2026
  • Humanities and Social Sciences Communications
  • Yingqi Huang + 1 more

AI-based models are transforming the translation industry, with tools like Google Translate’s neural machine translation (NMT-GT) and large language models (LLMs) driving progress. Yet, applying these models to literary translation, a field that remains challenging even for experienced human translators, raises important questions: How well can AI replicate the depth and nuance of human translation, and which type of AI, NMTs, general-purpose LLM, or reasoning-based LLM, better approximates human outputs? This corpus-based study investigates and compares translations by NMT-GT and two LLMs, ChatGPT-4o and OpenAI-o1, to human translations. Our analysis identifies substantial variations across multiple linguistic dimensions, including lexical and syntactic diversity, textbase and situation model, and readability. Results show that ChatGPT-4o aligns most closely with human translations in this literary autobiography case, followed by NMT-GT, while OpenAI-o1 demonstrates the least similarity. These findings suggest that NMT systems do not necessarily fall short of LLMs in approximating human translations. Reasoning-based OpenAI-o1 does not produce a more human-like translation profile than the general-purpose AI models, with ChatGPT-4o most effectively bridging the gap between human and AI-generated translations.

  • Research Article
  • 10.5296/ijele.v13i2.23057
Translation Quality in an Evolving Paradigm: Neural Machine Translation and Large Language Models in Technical Domains
  • Jul 31, 2025
  • International Journal of English Language Education
  • Zhongming Zhang + 3 more

The emergence of large language models (LLMs) has reshaped machine translation (MT). Although neural machine translation (NMT) systems like Google Translate (GT) remain dominant, systematic comparisons between LLMs and NMT systems across key quality dimensions are still limited, especially in specialised domains such as technical translation. This study aims to compare the translation quality and error subtypes of GT and ChatGPT-4 in Chinese-English technical manual translation. Eighty paragraph-level segments from Chinese product manuals were translated by both systems. Two trained annotators evaluated the outputs using a Likert scale across four MQM-based dimensions: accuracy, fluency, terminology, and style. Inter-rater agreement was tested and qualitative data analysis was conducted using NVivo. Results indicated that ChatGPT-4 outperformed GT across all dimensions, delivering higher quality translation, whereas GT frequently exhibited errors such as redundancy, stilted phrasing, non-standard terminology, and formality mismatches. ChatGPT-4, however, occasionally produced over-translation and semantic overgeneralisation, compromising terminological precision. Despite the superior performance of ChatGPT-4, it still poses certain potential risks. Its context-driven outputs may introduce inferential or stylistic deviations, especially in specialised terminology. For high-stakes technical content, expert revision is recommended to ensure semantic fidelity and terminological consistency.

  • PDF Download Icon
  • Conference Article
  • Cite Count Icon 13
  • 10.18653/v1/w15-4110
Baidu Translate: Research and Products
  • Jan 1, 2015
  • Zhongjun He

In this presentation, I would like to introduce the research and products of machine translation in Baidu. As the biggest Chinese search engine, Baidu has released its machine translation system in June, 2011. It now supports translations among 27 languages on multiple platforms, including PC, mobile devices, etc. Hybrid translation approach is important for building an Internet translation system. As we know, the translation demands on the Internet come from various domains, including news wires, patents, poems, idioms, etc. It is difficult for a single translation system to achieve high accuracy on all domains. Therefore, hybrid translation is practically needed. Generally, we build a statistical machine translation (SMT) system, using the training corpora automatically crawled from the web. For the translation of idioms (e.g. “有 志者,事竟成,where there is a will, there is a way”), hot words/expressions (e.g. “一带一路, One Belt and One Road ”), example-based translation methods are used. To improve the translation of date (e.g. “2012年7月6日, July 6, 2012”), numbers (e.g. “三千五百万, thirty-five million), etc, rule-based methods are used as pre-process. To improve translation quality for the resourcepoor language pairs, we used pivot-based methods. Wu and Wang (2007) proposed the triangulation method that combines the source-pivot and the pivot-target phrase tables to induce a sourcetarget phrase table. To fill up the data gap between the source-pivot and pivot-target corpora, Wu and Wang (2009) employed a hybrid method combining RBMT and SMT systems. We also proposed a method to use a Markov random walk to discover implicit relations between phrases in the source and target languages (Zhu et al., 2013), thus to improve the coverage of phrase pairs. We utilized the co-occurrence frequency of source-target phrase pairs to estimate phrase translation probabilities (Zhu et al., 2014). On May 20th this year, we have launched a neural machine translation (NMT) system for Chinese-English translation. The system conducts end-to-end translation with a source language encoder and a target language decoder. Both the encoder and decoder are recurrent neural networks. The strength of NMT lies in that it can learn semantic and structural translation information by taking global contexts into account. We further integrated the SMT and NMT system to improve translation quality. We also released off-line translation packs for NMT system on mobile devices, providing translation services in case that the Internet is unavailable. So far as we know, this is the first NMT system supporting off-line translation on mobile devices. We also investigate the problem of learning a machine translation model that can simultaneously translate sentences from one source language to multiple target languages (Dong et al., 2015). Our solution is inspired by the recently proposed neural machine translation model which generalizes machine translation as a sequence learning problem. We train a unified neural machine translation model under the multi-task learning framework where the encoder is shared across different language pairs and each target language has a separate decoder. This model gets faster and better convergence for both resource-rich and resourcepoor language pairs under the multi-task learning framework. Based on the above techniques, we have released translation products for multiple platforms, including web translation on PC, APP on mobile devices, as well as free API for the thirdparty developers. Our system now support translations among 27 languages, not only including many frequently-used foreign languages, but also

  • Dissertation
  • 10.32657/10356/157475
Neural machine translation with limited resources
  • Jan 1, 2022
  • Tasnim Mohiuddin

With the advent of deep neural networks in recent years, Neural Machine Translation (NMT) systems have achieved state-of-the-art performance on standard translation benchmarks. NMT is a way to translate from one language to another with a single neural network in an end-to-end manner. The NMT models have emerged quickly, and within a few years of research, they have outperformed the traditional statistical systems with impressive performance. Despite the success of NMT models in standard benchmarks, there are some notable limitations. One of them is that NMT models are known to be data-hungry, i.e., they tend to work very well only when a massive amount of parallel training data (a.k.a. bitext) is available, but perform poorly when the data is limited. Except for some mainstream languages, e.g., English, French, or Chinese, most natural languages are low-resourced and lack large parallel data. Moreover, acquiring large bitext corpora is not viable in most scenarios, especially with resource-constrained conditions like low-resource languages. Researchers have made numerous endeavors to expand the success of NMT from high-resource to low-resource languages like transfer learning, data augmentation, and pivoting. However, they still require strong cross-lingual signals, i.e., lots of parallel data. One solution to this problem might be transferring cross-lingual signals through cross-lingual word embeddings (CLWEs), which can be learned from monolingual data in an unsupervised way or with the help of a small seed dictionary. CLWEs seem to be very promising in resource-constrained machine translation (MT). Most of the successful and predominant CLWE methods (a.k.a. word translation methods) learn a linear mapping function based on the isomorphic assumption, which is problematic. We hypothesize to learn the cross-lingual mapping in a projected latent space which would give the model enough flexibility to induce the required geometric structures such that it would be easier to align the embeddings. Based on this hypothesis, we propose two novel models for learning CLWEs. We empirically show that our methods are particularly very effective for low-resource languages. We then turn our attention from word- to sentence-level translation with limited resources. Specifically, we focus on data augmentation strategies widely used in NLP and Computer Vision to increase the robustness of the models in resource-constrained scenarios. We investigate the domain-mismatch issue thoroughly that hinders the all-embracing success of the existing techniques in NMT. Eventually, we introduce a novel data augmentation framework for low-resource NMT that leverages the neighboring samples of the original parallel data without explicitly using additional monolingual data. Our framework can diversify the in-domain parallel data in a controlled way. We perform extensive experiments on four low-resource language pairs comprising data from different domains. We have shown that our method is comparable to the traditional back-translation that uses extra in-domain monolingual data. Typically, NMT systems are trained on heterogeneous data from different domains, sources, topics, styles, and modalities. The quality of the data also varies a lot. Usually, during training, all the data are concatenated and randomly shuffled. However, not all of them may be useful, some data may be redundant, and some might even be noisy and detrimental to the final NMT system performance. These problems are more acute in low-resource languages compared to the high-resource ones. Consequently, we explore the possibilities of curriculum training for NMT systems, i.e., presenting the data to the NMT systems in a systematic order during training. We introduce a two-stage curriculum training framework for NMT where we fine-tune a base NMT model on subsets of data. To select the data subsets, we propose two scoring approaches --- deterministic scoring using pre-trained methods and online scoring that considers prediction scores of the emerging NMT model. Our curriculum strategies consistently demonstrate better translation quality and faster convergence (approximately 50% fewer updates) on both high- and low-resource languages.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 39
  • 10.3390/info14100574
Translation Performance from the User’s Perspective of Large Language Models and Neural Machine Translation Systems
  • Oct 19, 2023
  • Information
  • Jungha Son + 1 more

The rapid global expansion of ChatGPT, which plays a crucial role in interactive knowledge sharing and translation, underscores the importance of comparative performance assessments in artificial intelligence (AI) technology. This study concentrated on this crucial issue by exploring and contrasting the translation performances of large language models (LLMs) and neural machine translation (NMT) systems. For this aim, the APIs of Google Translate, Microsoft Translator, and OpenAI’s ChatGPT were utilized, leveraging parallel corpora from the Workshop on Machine Translation (WMT) 2018 and 2020 benchmarks. By applying recognized evaluation metrics such as BLEU, chrF, and TER, a comprehensive performance analysis across a variety of language pairs, translation directions, and reference token sizes was conducted. The findings reveal that while Google Translate and Microsoft Translator generally surpass ChatGPT in terms of their BLEU, chrF, and TER scores, ChatGPT exhibits superior performance in specific language pairs. Translations from non-English to English consistently yielded better results across all three systems compared with translations from English to non-English. Significantly, an improvement in translation system performance was observed as the token size increased, hinting at the potential benefits of training models on larger token sizes.

  • Research Article
  • 10.1515/phras-2025-0006
Measuring Creative Phraseology in Literature: Machine Translation Systems Versus Large Language Models
  • Nov 11, 2025
  • Yearbook of Phraseology
  • Laura Noriega-Santiáñez + 1 more

In a growing digital scenario where phraseology has become aware of technological realities, literary translation is timidly testing sophisticated AI-based tools. This study aims at assessing the quality of the output rendered by neural machine translation (NMT) systems, i.e., DeepL and Google Translate, and large language models (LLMs), i.e., ChatGPT and Gemini, in the English>Spanish translation of five comparative idioms extracted from literary texts. To this end, professional literary translators and translation undergraduates evaluate their output against human translation (HT), following the parameters proposed by Corpas Pastor and Noriega-Santiáñez (2024) to measure creativity in the translation of multiword-expressions: adequacy (morphosyntactic, semantic, and pragmatic) and novelty. The findings show that HT stands out, although NMT systems outperformed morphosyntactically. LLMs, especially ChatGTP, show promising creative results. Therefore, this study serves to reflect on the use of technologies for the translation of creative phraseology in the context of literature.

  • Research Article
  • 10.17507/jltr.1702.28
Exploring the Role of Large Language Models in Translation Education: A Systematic Review
  • Mar 2, 2026
  • Journal of Language Teaching and Research
  • Anas M Alkhofi

Despite the surge of research interest in generative AI and the rapid public adoption of large language models (LLMs), their role in translation remains unclear. The reliability of these systems and their limitations as machine translation tools continue to be a central concern for translation teachers and students. Systematic reviews that specifically examine LLMs in translation are still scarce. This systematic review aims to address this gap by synthesizing and interpreting recent empirical studies on the use of LLMs in translation across three areas: (1) LLMs’ translation quality, (2) LLM-generated translation feedback, and (3) the integration of LLMs into translation education. Drawing on 55 empirical studies, the findings show that LLMs—particularly GPT—consistently outperform conventional neural MT systems. For general, non-specialized texts, their output often approaches human quality, though human translators maintain a clear advantage in culturally dense, technical, or literary content. Evidence further indicates that LLMs can provide helpful and timely feedback that identifies common linguistic issues, which in turn can assist both teachers and students; however, teacher feedback remains superior in depth, contextual sensitivity, and clarity. As contemporary translation workplaces increasingly rely on MT and AI-supported tools, training students to work with LLMs has become essential for aligning classroom practice with professional expectations. At the same time, educators must balance LLM-assisted tasks with hands-on human translation to ensure that students continue to develop essential linguistic and problem-solving skills.

  • Research Article
  • 10.5445/ir/1000104498
Multilingual Neural Translation
  • Feb 14, 2020
  • Repository KITopen (Karlsruhe Institute of Technology)
  • Thanh-Le Ha

Multilingual Neural Translation

  • Research Article
  • Cite Count Icon 11
  • 10.1075/target.21147.kol
Human and machine translation of occasionalisms in literary texts
  • Apr 3, 2023
  • Target
  • Waltraud Kolb + 2 more

Literary occasionalisms, new words coined by writers with a particular poetic aim in view, often pose a great challenge for translators. Given recent advances in machine translation (MT), could literary translators benefit from MT when it comes to the translation of occasionalisms? We address this question by considering the work of Austria’s most important nineteenth-century comedy writer, Johann Nestroy (1801–1862). We compare how human translators and two generic neural MT systems (Google Translate, DeepL) translated occasionalisms (compounds, derivations, and blends) in Nestroy’s play Der Talisman into English. While human translators largely refrained from creating new target expressions, the two MT systems generated a number of viable new coinages, most of them by literal translation procedures. In an interactive human-computer environment, using MT output as a repository from which to retrieve novel target solutions or derive inspiration might open up new avenues in the practice of literary translation.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant