How well can state-of-the-art machine translation systems render a 16th-century Chinese novel?
This study evaluates the performance of state-of-the-art machine translation systems in rendering Journey to the West, a culturally rich 16th-century Chinese novel, into Portuguese. Employing a mixed-methods approach, we compare translations produced by DeepSeek-V3, GPT-4o, DeepL Pro, and NovelTrans-J against a published human translation. Quantitative assessments conducted by an expert evaluator examine accuracy, fluency, stylistic elegance, cultural appropriateness, and overall translation quality at both the sentence and chunk levels. The results reveal that three MT systems (DeepSeek-V3, GPT-4o, and NovelTrans-J) produce translations of comparable or superior quality to the human translation. Among them, NovelTrans-J consistently outperforms all other participants, particularly in terms of cultural appropriateness. In contrast, DeepL Pro demonstrates significantly weaker performance across all evaluated dimensions. To complement the quantitative analysis, a qualitative investigation focuses on the rendering of culture-specific items (CSIs). NovelTrans-J exhibits outstanding performance, producing the fewest mistranslations and uniquely providing explanatory notes that facilitate reader comprehension. DeepSeek-V3 and GPT-4o also handle CSIs competently, though with less consistency, while DeepL Pro struggles considerably, showing a high rate of CSI mistranslations and generally low quality. Interestingly, the human translation also contains notable CSI-related errors, particularly in cases involving semantically opaque expressions, an area in which all participants encounter significant difficulty. These findings underscore the growing potential of MT systems to handle complex, culturally rich literary texts, although certain challenges, such as the translation of semantically opaque expressions, remain significant obstacles. We hope this study provides an updated perspective on the current capabilities of MT and offers practical insights to guide the development of future systems that can more accurately capture and transmit the distinctive cultural nuances embedded in literary works.
- Book Chapter
- 10.1007/978-3-030-66196-0_1
- Jan 1, 2020
Even though machine translation (MT) systems have shown promise for automatic translations, the quality of translations produced by MT systems is still far behind professional human translations (HTs), because of the complexity of grammar and word usage in natural languages. As a result, HTs are still commonly used in practice. Nevertheless, the quality of HTs is strongly depending on the skills and knowledge of translators. How to measure the quality of translations produced by MT systems and human translators in an automatic manner has faced a lot of challenges. The transitional way to manually checking the accuracy of translation quality by bilingual speakers is expensive and time-consuming. Therefore, we propose an unsupervised method to assess HTs and MTs quality without having access to any labelled data. We compare a range of methods which are able to automatically grade the quality of HTs and MTs, and observe that the Bidirectional Minimum Word Mover’s Distance (BiMWMD) obtains the best performance on both HTs and MTs dataset.
- Research Article
- 10.54254/2753-7064/2025.20639
- Jan 24, 2025
- Communications in Humanities Research
Since Google introduced the Transformer model into natural language processing (NLP) in 2017, AI-aided translation has rapidly advanced. At the same time, translation is evolving from a solitary endeavor into a cooperative activity between human translators and machine translation systems, epitomized by the emergency of platforms with the Machine Translation Post Editing (MTPE) function. The advent of new translation modes also leads to increased research evaluating the effectiveness and quality of machine translation, for example, studies on the translation quality under the Multidimensional Quality Metrics (MQM) error typology framework. Involving AI-based translators and MTPE in their translation enables human translators to prepare the engineering documents efficiently. However, researchers notice that it is difficult for most machine translators to figure out the semantic and cultural differences in the source language and generate coherent structural translation in the target language. This research opens up ChatGPTs application in tender document translation under the MQM framework, hoping to cast light on assessment on ChatGPT's translation quality, identification of ChatGPT's errors in translating such documents and suggestions on human translators' performance throughout MTPE.
- Research Article
22
- 10.1097/phh.0b013e3182a95c87
- Sep 1, 2014
- Journal of Public Health Management and Practice
Most local public health departments serve limited English proficiency groups but lack sufficient resources to translate the health promotion materials that they produce into different languages. Machine translation (MT) with human postediting could fill this gap and work toward decreasing health disparities among non-English speakers. (1) To identify the time and costs associated with human translation (HT) of public health documents, (2) determine the time necessary for human postediting of MT, and (3) compare the quality of postedited MT and HT. A quality comparison of 25 MT and HT documents was performed with public health translators. The public health professionals involved were queried about the workflow, costs, and time for HT of 11 English public health documents over a 20-month period. Three recently translated documents of similar size and topic were then machine translated, the time for human postediting was recorded, and a blind quality analysis was performed. Seattle/King County, Washington. Public health professionals. (1) Estimated times for various HT tasks; (2) observed postediting times for MT documents; (3) actual costs for HT; and (4) comparison of quality ratings for HT and MT. Human translation via local health department methods took 17 hours to 6 days. While HT postediting words per minute ranged from 1.58 to 5.88, MT plus human postediting words per minute ranged from 10 to 30. The cost of HT ranged from $130 to $1220; MT required no additional costs. A quality comparison by bilingual public health professionals showed that MT and HT were equivalently preferred. MT with human postediting can reduce the time and costs of translating public health materials while maintaining quality similar to HT. In conjunction with postediting, MT could greatly improve the availability of multilingual public health materials.
- Research Article
- 10.33542/jti2025-s-3
- Jan 1, 2025
- SKASE Journal of Translation and Interpretation
In this paper, we present the results of a study evaluating intralingual machine translations of health information texts into Plain German. In our study, we compare machine-translated simplified texts with those simplified manually by human translators, as well as with the original, unsimplified texts. We compare the output of four different machine translations systems and assess the translation quality using various criteria, including translation errors, readability, and syntactic complexity. The study reveals that from the four analysed machine translation systems, ChatGPT performed worst. Our results also suggest that fine-tuning a model with task-specific and domain-specific data improves the translation quality.
- Research Article
2
- 10.61200/mikael.129675
- Dec 1, 2010
- Mikael: Kääntämisen ja tulkkauksen tutkimuksen aikakauslehti
Translation quality can be evaluated with regard to different aspects, such as accuracy (fidelity), fluency and fitness for purpose. In using a machine translation system for information purposes, accuracy of semantic content is the key aspect of quality. Automated quality metrics developed in the machine translation field have been criticized for conflating fluency of form with accuracy of content and for failing to provide any information on the types of errors in the translations. Our research aims to discover criteria for assessing translation quality specifically in terms of accuracy of semantic content in translation. This paper demonstrates how an error analysis with a view to identifying different error types in machine translations can serve as a starting point for such criteria. The error classification described focuses on mismatches of semantic components (individual concepts and relations between them) in the source and target texts. We present error analysis results, which show differing patterns both between human translators and machine translation systems on the one hand and two different kinds of translation systems on the other.
- Research Article
8
- 10.58729/1941-6687.1122
- Jun 3, 2014
- Communications of the IIMA
INTRODUCTION Expert human translation still surpasses the best results of machine translation (MT) systems (Bar-Hillel, 2003), but it is often hard to schedule an interpreter at the spur of the moment, especially for relatively obscure languages. Several free, fully automatic, Web-based translation services are available to fill this need but at the expense of lower accuracy. However, many translations do not need to be perfect. For example, a reader of a Web page or an email message written in a foreign language might need to get only the gist of the passage before deciding whether more detailed, human translation is needed or the content is not important enough to proceed further with it. That is, poor accuracy quickly can have greater value than higher accuracy that is too late (Muegge, 2006). As a result, more words are now translated per year using MT than are translated by human translators, and the demand continues to grow (LISA, 2009). Few studies have been conducted on the relative accuracies of these Web-based services, however. The purpose of this paper is to provide a performance overview of four leading MT systems provided on the Web and to further assess the accuracy of the best. Prior Studies of Web-Based MT Systems Machine translation was first proposed in 1947, and the first demonstration of a translation system was in January 1954 (Hutchins, 2003). MT became available for personal computers in 1981, and in 1997, Babel Fish (using SYSTRAN) appeared as the first, free, translation service on the World Wide Web (Yang & Lange, 1998). Although several evaluation studies have been conducted on MT systems (e.g., NIST, 2008), based upon an extensive review of the literature, only a few have focused solely upon Web-based versions. For example, four have tested the accuracy of SYSTRAN (originally provided at http://babelfish.altavista.com/babelfish-now: http://babelfish.yahoo.com/): Study 1 (Aiken, Rebman, Vanjani, & Robbins, 2002): In one of the earliest studies of a Web-based MT system, four participants used SYSTRAN to automatically translate German, French, and English comments in an electronic meeting. After the meeting, two objective reviewers judged the overall accuracy of the translations to be about 50% while the understanding accuracy was about 95%. Study 2 (Aiken, Vanjani, & Wong, 2006): In another study, a group of 92 undergraduate students evaluated SYSTRAN translations of 12 Spanish text samples to English, and they were not able to understand only two of the 12 translations (83% accuracy). No significant differences in understandability were found based on gender, but those who reported understanding some Spanish were able to understand many of the translations to English better. Further, the accuracy did not seem to correlate with the complexity of the sentences. Study 3 (Yates, 2006): In a third study, 20 sentences (10 Spanish, 10 German) selected from Mexican and German civil codes and press releases from foreign ministries were translated to English with SYSTRAN, and the author evaluated the samples' accuracies. The system's performance was rated as poor, but it was not uniformly poor, i.e., German texts were translated less poorly than the Spanish ones. Study 4 (Ablanedo, Aiken, & Vanjani, 2007): In a final study, 10 English text samples were translated by an expert and an intermediate-level Spanish translator as well as SYSTRAN. The most fluent human was 100% accurate, and the other achieved 80% accuracy. The MT system achieved only 70% accuracy but was 195 times faster than the humans. All of these tests were based upon SYSTRAN, the system deemed most reliable at the time of the studies. However, new translation software on Google appeared in October 2007. Abandoning the rule-based algorithms of SYSTRAN which the site had used previously, Google Translate (http://translate. …
- Research Article
- 10.17721/studling2024.24.101-110
- Jan 1, 2024
- Studia Linguistica
Machine translation systems permeate all spheres of human activity, including the field of literary translation. Scholars note that machine translation holds significant potential for development and is less labor-intensive compared to the work of human translators. Consequently, there is a need to analyze common errors in machine translation to help avoid them in the future as neural machine translation systems continue to evolve. This article examines and analyzes the quality of a Ukrainian translation of the Spanish short story “Amigos” by Argentine writer Julio Cortázar, performed by the machine translation system DeepL. The translated text was compared with the official Ukrainian translation done by a human translator. The translated text was analyzed for errors, and the types of errors made by the DeepL system were identified. Additionally, the number of errors was counted. The study employed methods of analysis, synthesis, and comparison of the original and translated texts. Considering the number and significance of the errors in terms of their impact on the essence of the text within narrow and broad contexts, we concluded that the quality of the story’s translation is relatively high. It was determined that using machine translation for translating literary works, particularly short prose, is potentially feasible and effective, provided that the translated text is subsequently edited by a human translator. To improve the quality of machine translation of literary texts, recommendations were developed to enhance the performance of the DeepL machine translation system.
- Dissertation
3
- 10.4995/thesis/10251/17174
- Oct 3, 2012
The main goal of this thesis is to develop computer assisted translation and machine translation systems which present a more robust synergy with their potential users. Hence, the main purpose is to make current state-of-the-art systems more ergonomic, intuitive and efficient, so that the human expert feels more comfortable when using them. For doing this, different techniques are presented, focusing on improving the adaptability and response time of the underlying statistical machine translation systems, as well as a strategy aiming at enhancing human-machine interaction within an interactive machine translation setup. All of this with the ultimate purpose of filling in the existing gap between the state of the art in machine translation and the final tools that are usually available for the final human translators. Concerning the response time of the machine translation systems, a parameter pruning technique is presented, whose intuition stems from the concept of bilingual segmentation, but which evolves towards a full parameter re-estimation strategy. By using such strategy, experimental results presented here prove that it is possible to achieve reductions of up to 97% in the number of parameters required without a significant loss in translation quality. Being robust across different language pairs, these results evidence that the pruning technique presented is effective in a traditional machine translation scenario, and could be used for instance in a post-editing setup. Nevertheless, experiments carried out within a simulated interactive machine translation environment are slightly less convincing, since a trade-off between response time and translation quality is needed. Two orthogonally different approaches are presented with the purpose of increasing the adaptability of the statistical machine translation systems. On the one hand, we investigate how to increase the adaptability of the language model, by subdividing it into several smaller language models which are then interpolated in translation time according to the source sentence to be translated. The specific sub-models are built either by taking advantage of supervised information present in certain bilingual corpora, or by performing unsupervised clustering on the training set, with the aim of uncovering specific sub-topics or language styles present. On the other hand, Bayesian predictive adaptation is elucidated as an efficient strategy for adapting the translation models present in state-of-the-art machine translation systems. Although adaptation experiments are only performed within the traditional machine translation framework, the results obtained are compelling enough for implementing them within an interactive setup, and such work will be done in the near future. Nevertheless, it should be noted that the techniques developed may be readily implemented within a computer assisted translation scenario, in which a statistical machine translation system is providing the translations that the user needs to modify and validate. Finally, special attention is devoted to increasing the synergy between the human expert and the interactive machine translation system. With this purpose, two different forms of weaker feedback are studied, which intend to increase the productivity of the human translator. For doing this, two different changes to the traditional interaction scheme are presented. The first one aims at anticipating the user's actions, and the second one is targeted at increasing the flexibility of the system whenever the user signals that there is an error he wants the system to correct.
- Research Article
1
- 10.15688/jvolsu2.2024.5.1
- Dec 27, 2024
- Vestnik Volgogradskogo gosudarstvennogo universiteta. Serija 2. Jazykoznanije
The article discusses some current issues of interpreting out-of-vocabulary words by modern machine translation systems (MT systems) in the context of changing forms and ways of maintaining an automatic dictionary. It provides a critical outline of the typology of MT systems and strategies for their development. It describes the impact of fast developing software and technologies on these strategies and analyzes the changes they bring into the forms of dictionary support. The research shows that the linguistic support and the structure of automatic dictionaries, whatever the MT system is, are fundamentally important for ensuring the quality of translation. Despite all the success of neural MT (NMT) systems, their automatically updated vocabulary databases do not record words characterized by terminological specificity and low frequency in the special texts and text corpora on which the system is trained. Analysis of translations performed by two popular NMT systems – Google Translate and Yandex Translate – has proven that they fail to process and unify the translation of words that are not entered in the system dictionaries, a task used to be solved easily by users of all types of MT systems with the help of automatic dictionaries. With statistic-based automatic dictionaries it remains a pressing problem and requires a special approach when editing MP results.
- Research Article
25
- 10.1007/s10590-006-9015-5
- Nov 2, 2006
- Machine Translation
This paper presents an extended, harmonised account of our previous work on combining subsentential alignments from phrase-based statistical machine translation (SMT) and example-based MT (EBMT) systems to create novel hybrid data-driven systems capable of outperforming the baseline SMT and EBMT systems from which they were derived. In previous work, we demonstrated that while an EBMT system is capable of outperforming a phrase-based SMT (PBSMT) system constructed from freely available resources, a hybrid ‘example-based’ SMT system incorporating marker chunks and SMT subsentential alignments is capable of outperforming both baseline translation models for French–English translation. In this paper, we show that similar gains are to be had from constructing a hybrid ‘statistical’ EBMT system. Unlike the previous research, here we use the Europarl training and test sets, which are fast becoming the standard data in the field. On these data sets, while all hybrid ‘statistical’ EBMT variants still fall short of the quality achieved by the baseline PBSMT system, we show that adding the marker chunks to create a hybrid ‘example-based’ SMT system outperforms the two baseline systems from which it is derived. Furthermore, we provide further evidence in favour of hybrid systems by adding an SMT target-language model to the EBMT system, and demonstrate that this too has a positive effect on translation quality. We also show that many of the subsentential alignments derived from the Europarl corpus are created by either the PBSMT or the EBMT system, but not by both. In sum, therefore, despite the obvious convergence of the two paradigms, the crucial differences between SMT and EBMT contribute positively to the overall translation quality. The central thesis of this paper is that any researcher who continues to develop an MT system using either of these approaches will benefit further from integrating the advantages of the other model; dogged adherence to one approach will lead to inferior systems being developed.
- Dissertation
- 10.6092/unibo/amsdottorato/9191
- Mar 30, 2020
The present work is a feasibility study on the application of Machine Translation (MT) to institutional academic texts, specifically course catalogues, for Italian-English and German-English. The first research question of this work focuses on the feasibility of profitably applying MT to such texts. Since the benefits of a good quality MT might be counteracted by preconceptions of translators towards the output, the second research question examines translator trainees' trust towards an MT output as compared to a human translation (HT). Training and test sets are created for both language combinations in the institutional academic domain. MT systems used are ModernMT and Google Translate. Overall evaluations of the output quality are carried out using automatic metrics. Results show that applying neural MT to institutional academic texts can be beneficial even when bilingual data are not available. When small amounts of sentence pairs become available, MT quality improves. Then, a gold standard data set with manual annotations of terminology (MAGMATic) is created and used for an evaluation of the output focused on terminology translation. The gold standard was publicly released to stimulate research on terminology assessment. The assessment proves that domain-adaptation improves the quality of term translation. To conclude, a method to measure trust in a post-editing task is proposed and results regarding translator trainees trust towards MT are outlined. All participants are asked to work on the same text. Half of them is told that it is an MT output to be post-edited, and the other half that it is a HT needing revision. Results prove that there is no statistically significant difference between post-editing and HT revision in terms of number of edits and temporal effort. Results thus suggest that a new generation of translators that received training on MT and post-editing is not influenced by preconceptions against MT.
- Research Article
- 10.11648/j.ajcst.20250802.14
- Jun 18, 2025
- American Journal of Computer Science and Technology
This paper examines the output of culture-specific items (CSIs) generated by ChatGPT 3.5 and ChatGPT Pro in response to three prompts to translate three anthologies of African poetry. The first prompt was broad, the second focused on poetic structure, and the third emphasized cultural specificity. To support this analysis, five comparative tables were created. The first and second tables presents the results of the CSIs produced by Chat GPT 3.5 and ChatGPT Pro respectively after the three prompts; the third table categorizes the unchanged CSIs based on Aixelá’s framework of “Proper nouns and Common expressions”; the fourth summarizes the CSIs generated by the human translators, a custom-built translation engine (CTE), and the two versions of a Large Language Model (LLM). The fifth table shows how the seven CSIs that were repeated in translation in French were rendered after the three prompts. The sixth table shows the strategies employed by ChatGPT 3,5 and ChatGPT Pro after the culture-specific prompt on the CSIs that were not translated unrepeated. Compared to the outputs of CSIs from the reference human translation (HT) and the CTE in prior studies, the findings indicate that the culture-oriented prompts used with ChatGPT Pro did not yield significant enhancements in the CSIs during the translation of the three African poetry from English to French. On evaluation however, ChatGPT Pro scored better in BLEURT than ChatGPT 3.5. A combined total of 20 CSIs were generated by the LLM versions, where 13 were repeated as the source word. The repeated CSIs were inconsistent with the outcome of the HT and CTE; some of the translations of the remaining seven unrepeated CSIs were also inaccurate compared to the reference HT and CTE. While the corpus of this investigation is small, the results show that the data used to build LLMs has not been French-centric nor poetry domain-specific and thus LLMs could benefit from a higher and better performance when tailored to other languages and specific domains.
- Research Article
- 10.31004/jele.v10i3.900
- May 16, 2025
- Journal of English Language and Education
Translation of metaphors in song lyrics presents a unique challenge, especially in the context of human versus machine translation. This study investigates how metaphors in Maher Zain’s Insha Allah and Harris J’s Worth It are interpreted and translated differently by human translators and machine translation tools such as Google Translate and DeepL. Using a descriptive qualitative approach with a comparative analysis method, this research identifies the translation techniques used and evaluates how effectively the metaphorical meanings and emotional tones are conveyed in both types of translations. The findings reveal that machine translators tend to translate metaphors literally, often missing cultural and emotional nuances. In contrast, human translators employ strategies like adaptation and paraphrasing to preserve implied meanings and emotional resonance. Furthermore, machine translation performs better on culturally universal metaphors but fails to accurately render idiomatic and emotionally rich metaphors. This study highlights the ongoing relevance of human translators in conveying metaphorical meaning in creative texts and contributes to the broader discourse on translation quality and technique in the age of artificial intelligence.
- Research Article
- 10.52547/jfl.9.39.81
- Mar 1, 2021
- International Journal of Foreign Language Teaching and Research
The present descriptive study aimed at investigating the human and machine Persian translations of The Kite Runner by Khalid Hosseini, and comparing the applied translation strategies in the translated texts for culture-specific items (CSI). To this end, based on Newmark’s (1988) category, the applied strategies were identified in the two translations and compared. The obtained results showed that Naturalization and Transposition strategies were the most frequently-used strategies by both human translators and machine translation. The results also showed that machine translation could not present a comprehensible translation due to overuse of these strategies (75%). It was further revealed that the spirit of the original text was not lost in the translated versions due to the closeness of Iranian and Afghan cultures. In fact, the translated versions kept the real beauty and creativity of the original work. However, the remorseful theme of the source text was kept intact to a great extent in the human translation of the novel, while machine translation lost it. Thus, the general impression is that culture-specific terms make it difficult for the machine translation to achieve complete word-for-word and semantic equivalence, and that the human translator must have a broad knowledge of the literature and traditions of both the source and target languages.
- Conference Article
5
- 10.3115/980845.980886
- Jan 1, 1998
The previous English-Korean MT system that was the transfer-based MT system and applied to only written text enumerated a following brief list of the problems that had not seemed to be easy to solve in the near future: 1) processing of non-continuous idiomatic expressions 2) reduction of too many ambiguities in English syntactic analysis 3) robust processing for failed or illformed sentences 4) selecting correct word correspondence between several alternatives 5) generation of Korean sentence style. The problems can be considered as factors that have influence on the translation quality of machine translation system. This paper describes the symbolic and statistical hybrid approaches to solutions of problems of the previous English-to-Korean machine translation system in terms of the improvement of translation quality. The solutions are now successfully applied to the web-based English-Korean machine translation system FromTo/EK which has been developed from 1997.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.