• All Solutions All Solutions Caret
    • Editage

      One platform for all researcher needs

    • Paperpal

      AI-powered academic writing assistant

    • R Discovery

      Your #1 AI companion for literature search

    • Mind the Graph

      AI tool for graphics, illustrations, and artwork

    Unlock unlimited use of all AI tools with the Editage Plus membership.

    Explore Editage Plus
  • Support All Solutions Support
    discovery@researcher.life
Discovery Logo
Paper
Search Paper
Cancel
Ask R Discovery
Explore

Feature

  • menu top paper My Feed
  • library Library
  • translate papers linkAsk R Discovery
  • chat pdf header iconChat PDF
  • audio papers link Audio Papers
  • translate papers link Paper Translation
  • chrome extension Chrome Extension

Content Type

  • preprints Preprints
  • conference papers Conference Papers
  • journal articles Journal Articles

More

  • resources areas Research Areas
  • topics Topics
  • resources Resources
git a planGift a Plan

Parallel Corpus Research Articles

  • Share Topic
  • Share on Facebook
  • Share on Twitter
  • Share on Mail
  • Share on SimilarCopy to clipboard
Follow Topic R Discovery
By following a topic, you will receive articles in your feed and get email alerts on round-ups.
Overview
1553 Articles

Published in last 50 years

Related Topics

  • Bilingual Corpus
  • Bilingual Corpus
  • Comparable Corpora
  • Comparable Corpora
  • Monolingual Corpora
  • Monolingual Corpora
  • Parallel Sentences
  • Parallel Sentences
  • Parallel Texts
  • Parallel Texts
  • Bilingual Dictionaries
  • Bilingual Dictionaries
  • Language Corpora
  • Language Corpora

Articles published on Parallel Corpus

Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
1532 Search results
Sort by
Recency
Differentiated Interpersonal Functions of Metadiscursive Nouns in Cross-Cultural Contexts: A Corpus-Based Analysis

Drawing on Hyland’s model of interactive metadiscourse and Brown & Levinson’s politeness theory, this study investigates the cross-cultural divergences and evolutionary mechanisms of interpersonal functions of metadiscursive nouns (MNs) in Chinese and English academic writing. Utilizing a diachronic bilingual parallel corpus (2015–2025) comprising 200 journal articles from applied linguistics in each language (totaling 1.5 million tokens),this study systematically analyze how MNs mediate writer-reader interactions across linguistic and cultural boundaries.This study innovatively proposes a tripartite analytical framework of Rhetorical Identity-Cognitive Interaction-Cultural Adaptation to unravel the co-evolutionary mechanisms underlying MNs’ divergences, driven by disciplinary norms, cognitive schemata, and rhetorical traditions. Amplifier MNs (e.g., evidence) occur more frequently in English than in Chinese, predominantly through N+be+complement clause constructions to bolster propositional authority; Hedging MNs are more prevalent in Chinese, frequently co-occurring with epistemic modals to form double-hedging patterns, reflecting Confucian “prudent speech”principles in safeguarding positive face.In the cognitive interaction dimension, disciplinary paradigm shifts drive diachronic divergence. Interpretative MNs (e.g., interpretation) in English exhibit an increase, reflecting a post-positivist interpretative turn;empirical MNs (e.g.,data) in Chinese show a surge, reinforcing the institutionalization of quantitative methodologies in scholarly practice.In the rhetorical identity dimension, English deploys explicit self-references (we) and reader-oriented MNs (implication) to construct dialogic authority; Chinese favors impersonal MNs (e.g., this study) for depersonalized persuasion, with a higher density than English self-referential markers, highlighting collectivist cultural constraints on individual rhetorical agency. This study breaks through Western-centric frameworks of metadiscourse interpretation, confirming that cross-cultural differences in MNs are essentially the product of the co-evolution of academic institutions, cognitive habits, and rhetorical traditions. It provides theoretical grounding and methodological support for constructing an inclusive global academic rhetoric system.

Read full abstract
  • Journal IconScientific Journal Of Humanities and Social Sciences
  • Publication Date IconMay 11, 2025
  • Author Icon Shipeng Song
Just Published Icon Just Published
Cite IconCite
Save

Metaphorization in the translation of political texts

AbstractThe translation strategy known as “metaphorization” involves adding a metaphor to the target text when the source text contains either a non-metaphorical expression or nothing at all. This strategy has been identified by Toury (1995, p. 81) in the two abovementioned categories as “non-metaphor into metaphor” and “0 into metaphor”, alongside other typical strategies such as retaining the original metaphor. While there has been research on metaphor translation strategies, the phenomenon of metaphorization remains relatively unexplored. Metaphorization highlights the extra effort translators make beyond conveying what is already in the source text. It is therefore interesting to explore what metaphors translators add to their translations, to understand the contexts in which translators utilize metaphorization, and to discuss the potential influence of metaphorization as such. Taking political discourse as a grounding owing to its richness in metaphor and expressiveness in communicating ideology, this paper seeks to provide some tentative insights into these questions based on a parallel corpus comprising Chinese source texts and English target texts. Drawing on recent developments in cognitive metaphor theories, the phenomenon of metaphorization is analyzed from linguistic, cognitive, and communicative perspectives.

Read full abstract
  • Journal IconAcross Languages and Cultures
  • Publication Date IconMay 5, 2025
  • Author Icon Lei Zhang + 1
Just Published Icon Just Published
Cite IconCite
Save

중한 병렬말뭉치 기반 중국어 조사 ‘了’의 한국어 대응 양상 연구

Based on a parallel corpus of Chinese and Korean newspapers and TV dramas, the present study analyzes the multiple meanings of the Chinese particle ‘le’ in daily life and its corresponding expressions in Korean. It is found that compared to written language, the usage frequency of ‘le’ is higher and its meanings are more diverse in spoken language. In written language, ‘le’ indicating the completion of an action or with an object meaning mainly corresponds to the Korean past tense ending ‘-aht/eot-’, while in spoken language, ‘le’ corresponds to the Korean past tense ending regardless of its meaning. The ‘le’ expressing a change of state, an imminent action, or a weakened ‘la’ meaning usually has no corresponding expression in Korean. Compared to written language, the expressions corresponding to ‘le’ in Korean are more diverse in spoken language, including sentence endings, attributives, and conjunctive endings. This study provides important basic data source for Chinese-Korean language teaching, translation, and comparative research.

Read full abstract
  • Journal IconThe Research Society for the Korean Language Education
  • Publication Date IconApr 30, 2025
  • Author Icon Wenhua Li
Just Published Icon Just Published
Cite IconCite
Save

Translating into Topic Chains in International Laws: a Corpus-based Study of English-Chinese Translation Shift

Abstract This study explores discourse restructuring in English-Chinese legal translation, focusing on the translation shift to Chinese-specific topic chain clauses. Using data from a parallel English-Chinese legal corpus, this research reveals that the shift from English “subject-prominent” clause combining to Chinese topic chaining is influenced by the textual status of the English grammatical subject. Parameters for determining textual status include semantic animacy, discourse saliency and position within the schematic structure of the genre. The more marked the English grammatical subject is according to these parameters, the higher its textual status, the more likely it is to be translated into Chinese as topic chaining clauses into sentence. The study argues that topic chain translation accounts for universal translation shifts, including explicitation, implicitation, and reference tracking shifts. It proposes that topic chain translation is the “hybridization” strategy adopted by translators to preserve the specific legal coherence of the source text while aligning with the linguistic and communicative norms of the target language. This makes translation shifts predictable. Viewing “translation as legal action” emphasizes the recreation of meaning, considering extra-linguistic factors relevant to the genre-specific discourse community. Topic chain translation offers a practical approach to achieving equivalent communicative purposes in the target genre, thereby facilitating the “text translation” of morphologically structured languages like English into Chinese, which is characterized by a freer word order. The corpus and empirical findings of this study can inform the teaching of legal translation and the development of algorithms in machine translation to assist in generating or annotating appropriate clause combining structures. The study also calls for further investigations into discourse restructuring in legal translation, particularly at the intersection of language contrasts, legislative contexts, and genre conventions, where the effective recreation of international legal reasoning occurs.

Read full abstract
  • Journal IconContrastive Pragmatics
  • Publication Date IconApr 29, 2025
  • Author Icon Shijie Liu + 1
Open Access Icon Open AccessJust Published Icon Just Published
Cite IconCite
Save

Research on machine translation of ancient books in the era of large language model

The powerful cross-linguistic capability of large language models has advanced the development of machine translation research. Based on the large language model of Xunzi’s series of ancient books, this paper explores the command fine-tuning of domain-oriented large models in machine translation oriented to the vertical domain of ancient books. One million two hundred thousand pairs of text-white parallel corpus were acquired, including three hundred thousand pairs of traditional corpus and nine hundred thousand pairs of simplified corpus, and the instruction fine-tuning dataset for the large language model was constructed. In this paper, six models are selected for instruction fine-tuning, and three different metrics are used to evaluate their performance. The evaluation results show that, compared with the generalized base model, the Xunzi series model improves in all indexes, and the Xunzi-Baichuan2-7B model has the best effect. Finally, the Xunzi-Baichuan2-7B model is fine-tuned with all parameters, and the translation results are analyzed.

Read full abstract
  • Journal Iconnpj Heritage Science
  • Publication Date IconApr 28, 2025
  • Author Icon Zhixiao Zhao + 3
Open Access Icon Open AccessJust Published Icon Just Published
Cite IconCite
Save

The Role of Corpus Linguistics in Contemporary Linguistics Research and Translation Studies

The article presents a systematic review of research papers on corpus linguistics as an innovative direction in empirical linguistics. It reveals the theoretical and methodological foundations of the field, defines the specifics of corpus linguistics in comparison with computational linguistics, and highlights modern trends in corpus research as well as the advantages of employing textual corpus data in linguistic studies. It is noted that the corpus method and its resources are actively used in various linguistic researches into many world languages today; that language corpora are differentiated by volume, type, structure, content, purpose, etc. The article provides an overview of such corpora as the British National Corpus (BNC), the American National Corpus (ANC), the Corpus of Contemporary American English (COCA), the Russian National Corpus (RNC), and the Modern Chinese Language Corpus created in the Center for Chinese Linguistics at Beijing University, the Balanced Corpus of the Chinese Language. The specifics of parallel and comparative corpora are noted, including the Child Language Data Exchange System (CHILDES), the International Comparable Corpus (ICC), and the Corpus of English Wikipedia (CEW). The differences between parallel and comparative corpora are also outlined. Prospects for national corpora development lie in new research into almost every area of applied and theoretical linguistics, as well as in scrutiny and further development of translation theory and practice. The characteristics of monolingual, comparative, and parallel corpora are highlighted in the context of their role in linguistic research. It is mentioned that, in addition to parallel corpora, translators' tools also include monolingual corpora, which provide additional material on the subject of translation and enhance the translator's background knowledge.

Read full abstract
  • Journal IconVestnik Volgogradskogo gosudarstvennogo universiteta. Serija 2. Jazykoznanije
  • Publication Date IconApr 24, 2025
  • Author Icon Haitong Pei
Just Published Icon Just Published
Cite IconCite
Save

Semantic Role Labeling in Neural Machine Translation Addressing Polysemy and Ambiguity Challenges

The persistent challenges of polysemy and ambiguity continue to hinder the semantic accuracy of Neural Machine Translation (NMT), particularly in language pairs with distinct syntactic structures. While transformer-based models such as BERT and GPT have achieved notable progress in capturing contextual word meanings, they still fall short in understanding explicit semantic roles. This study aims to address this limitation by integrating Semantic Role Labeling (SRL) into a Transformer-based NMT framework to enhance semantic comprehension and reduce translation errors. Using a parallel corpus of 100,000 English-Indonesian and English-Japanese sentence pairs, the proposed SRL-enhanced NMT model was trained and evaluated against a baseline Transformer NMT. The integration of SRL enabled the model to annotate semantic roles, such as agent, patient, and instrument, which were fused with encoder representations through semantic-aware attention mechanisms. Experimental results demonstrate that the SRL-integrated model significantly outperformed the standard NMT model, improving BLEU scores by 6.2 points (from 32.5 to 38.7), METEOR scores by 6.3 points (from 58.5 to 64.8), and reducing the TER by 5.8 points (from 45.1 to 39.3). These results were statistically validated using a paired t-test (p < 0.05). Furthermore, qualitative analyses confirmed SRL's effectiveness in resolving lexical ambiguities and syntactic uncertainties. Although SRL integration increased inference time by 12%, the performance trade-off was deemed acceptable for applications requiring higher semantic fidelity. The novelty of this research lies in the architectural fusion of SRL with transformer-based attention layers in NMT, a domain seldom explored in prior studies. Moreover, the model demonstrates robust performance across linguistically divergent language pairs, suggesting its broader applicability. This work contributes to the advancement of semantically aware translation systems and paves the way for future research in unsupervised SRL integration and multilingual scalability.

Read full abstract
  • Journal IconJournal of Technology Informatics and Engineering
  • Publication Date IconApr 21, 2025
  • Author Icon Yan Qin
Just Published Icon Just Published
Cite IconCite
Save

Didactic potential of working with DIY corpora and text mining approaches in literary translation training

ABSTRACT Corpus analysis methods have been widely employed in literary translation research by numerous scholars. However, their integration into literary translation training has yet to be developed. With the advancement of AI technology, this paper explores the potential of employing AI-enhanced corpus text analysis and text mining techniques in this context. We propose a curriculum design that combines corpus and text mining methods structured in three stages. In the first stage, students in a literary translation course learn to build their own DIY parallel corpus, using distant reading tools for a comparative analysis of the linguistic and stylistic features of the source text and its translated texts. The next stage introduces text mining techniques for paratextual analysis of translations, including exploration tasks such as naming entity recognition, topic modelling, keyword extraction, text summarisation, and sentiment analysis. Based on the findings from corpus searches and text mining analyses, students develop a retranslation plan. The final stage involves the presentation of their retranslations and corresponding paratexts. The effectiveness of this course is assessed through evaluations and student feedback, highlighting the value of integrating text mining approaches in literary translation training.

Read full abstract
  • Journal IconThe Interpreter and Translator Trainer
  • Publication Date IconApr 17, 2025
  • Author Icon Yi-Ping Wu + 2
Cite IconCite
Save

Enhancing Cross Language for English-Telugu pairs through the Modified Transformer Model based Neural Machine Translation

Cross-Language Translation (CLT) refers to conventional automated systems that generate translations between natural languages without human involvement. As the most of the resources are mostly available in English, multi-lingual translation is badly required for the penetration of essence of the education to the deep roots of society. Neural machine translation (NMT) is one such intelligent technique which usually deployed for an efficient translation process from one source of language to another language. But these NMT techniques substantially requires the large corpus of data to achieve the improved translation process. This bottleneck makes the NMT to apply for the mid-resource language compared to its dominant English counterparts. Although some languages benefit from established NMT systems, creating one for low-resource languages is a challenge due to their intricate morphology and lack of non-parallel data. To overcome this aforementioned problem, this research article proposes the modified transformer architecture for NMT to improve the translation efficiency of the NMT. The proposed NMT framework, consist of Encoder-Decoder architecture which consist of enhanced version of transformer architecture with the multiple fast feed forward networks and multi-headed soft attention networks. The designed architecture extracts word patterns from a parallel corpus during training, forming an English–Telugu vocabulary via Kaggle, and its effectiveness is evaluated using measures like Bilingual Evaluation Understudy (BLEU), character-level F-score (chrF) and Word Error Rate (WER). To prove the excellence of the proposed model, extensive comparison between the proposed and existing architectures is compared and its performance metrics are analysed. Outcomes depict that the proposed architecture has shown the improvised NMT by achieving the BLEU as 0.89 and low WER when compared to the existing models. These experimental results promise the strong hold for further experimentation with the multi-lingual based NMT process.

Read full abstract
  • Journal IconInternational Journal of Computational and Experimental Science and Engineering
  • Publication Date IconApr 16, 2025
  • Author Icon Vaishnavi Sadula + 1
Cite IconCite
Save

Exploring the Styles of Chinese and English Translators of One Hundred Years of Solitude from the Perspective of Cohesion

Numerous studies on translator style have been conducted ever since Mona Baker advocated the application of corpus linguistics to the research into the style of literary translators. However, few studies focus on the styles of translators of different languages of an original novel. This study attempts to compare the styles of Chinese and English translators of the Spanish novel One Hundred Years of Solitude. It adopts corpus-driven and corpus-based approaches to unveiling the similarities and differences between the two translators. Both a monolingual corpus and a trilingual parallel corpus were built to achieve this goal. Taking conjunctions as linguistic triggers for the comparison, the present study explores the Chinese and English translations of the five most frequent conjunctions in the first two chapter of the well-reputed Spanish original novel. The statistics of concordance lines show that there are similarities and differences between the two translators with regard to translation style.

Read full abstract
  • Journal IconEnglish Language and Literature Studies
  • Publication Date IconApr 16, 2025
  • Author Icon Zhanfeng Hu + 1
Cite IconCite
Save

병렬 말뭉치 기반 L1(한국어) 화자를 위한 L2(한국수어) 교육 연어 탐색 방법 연구

Objectives This study explores a method for identifying collocations in Korean Sign Language (KSL) using a Korean-KSL parallel corpus and examines its applicability to L1 (Korean) speakers learning KSL as an L2. Methods The Korean-KSL parallel corpus was preprocessed using Python, and high-frequency word pairs were extracted based on N-grams using #LancsBox. To systematically outline the collocation exploration process, the study focused on the most frequent preceding word, {돕다} (to help), and applied Log Dice and Delta P metrics to assess collocational strength, leading to the selection of collocation candidates. Results The application of the proposed collocation exploration method based on a parallel corpus enabled the identification of collocation candidates at the two-word level. Additionally, certain expressions that exhibited weak collocation potential at the two-word level were confirmed as collocations when expanded to three-word sequences. Moreover, the use of Log Dice and Delta P in the word-unit analysis quantitatively measured differences in word order between Korean and KSL, highlighting the significance of bidirectional collocational strength analysis. Conclusions The study confirmed the validity of the proposed collocation exploration method and demonstrated that word-unit analysis incorporating collocational strength is essential for identifying collocation candidates, beyond simple frequency-based approaches. Furthermore, the findings empirically show that collocation learning contributes not only to vocabulary acquisition but also to reducing word order errors in KSL education. Future research will expand the range of central words and genres to enhance the collocation list and establish a foundation for developing more refined educational materials for KSL learners.

Read full abstract
  • Journal IconKorean Association For Learner-Centered Curriculum And Instruction
  • Publication Date IconApr 15, 2025
  • Author Icon Sunju Noh
Cite IconCite
Save

Adaptive Few-shot Prompting for Machine Translation with Pre-trained Language Models

Recently, Large Language Models (LLMs) with in-context learning have demonstrated remarkable potential in handling neural machine translation. However, existing evidence shows that LLMs are prompt-sensitive and it is sub-optimal to apply the fixed prompt to any input for downstream machine translation tasks. To address this issue, we propose an adaptive few-shot prompting (AFSP) framework to automatically select suitable translation demonstrations for various source input sentences to further elicit the translation capability of an LLM for better machine translation. First, we build a translation demonstration retrieval module based on LLM's embedding to retrieve top-k semantic-similar translation demonstrations from aligned parallel translation corpus. Rather than using other embedding models for semantic demonstration retrieval, we build a hybrid demonstration retrieval module based on the embedding layer of the deployed LLM to build better input representation for retrieving more semantic-related translation demonstrations. Then, to ensure better semantic consistency between source inputs and target outputs, we force the deployed LLM itself to generate multiple output candidates in the target language with the help of translation demonstrations and rerank these candidates. Besides, to better evaluate the effectiveness of our AFSP framework on the latest language and extend the research boundary of neural machine translation, we construct a high-quality diplomatic Chinese-English parallel dataset that consists of 5,528 parallel Chinese-English sentences. Finally, extensive experiments on the proposed diplomatic Chinese-English parallel dataset and the United Nations Parallel Corpus (Chinese-English part) show the effectiveness and superiority of our proposed AFSP.

Read full abstract
  • Journal IconProceedings of the AAAI Conference on Artificial Intelligence
  • Publication Date IconApr 11, 2025
  • Author Icon Lei Tang + 4
Cite IconCite
Save

Optimizing Seq2Seq LSTM for Regional-to-National language translation on a web platform

Machine translation for low-resource languages remains a significant challenge due to the lack of parallel corpora and optimized model configurations. This study developed and optimized a Seq2Seq Long Short-Term Memory (LSTM) model for Tegalan-to-Indonesian translation. A manually curated parallel corpus was constructed to train and evaluate the model. Various hyperparameter configurations were systematically tested, with the best-performing model achieving a BLEU score of 11.7381 using a dropout rate of 0.5, batch size of 64, learning rate of 0.01, and 70 training epochs. The results demonstrated that higher dropout rates, smaller batch sizes, and longer training durations enhanced model generalization and translation accuracy. The optimized model was deployed into a web-based application using Streamlit, ensuring accessibility for real-time translation. The findings highlighted the importance of hyperparameter tuning in neural machine translation for low-resource languages. Future research should explore Transformer-based architectures, larger datasets, and reinforcement learning techniques to further enhance translation quality and generalization.

Read full abstract
  • Journal IconJournal of Soft Computing Exploration
  • Publication Date IconApr 9, 2025
  • Author Icon Dwi Intan Af'Idah + 3
Cite IconCite
Save

Translational reconstruction of iconicity: a corpus-based cognitive stylistic study on the acoustic narration in the English translations of Can Xue’s fiction

Abstract As an important representative of contemporary Chinese avant-garde, Can Xue’s fiction has a unique language style. The use of onomatopoetic reduplications is one of the major linguistic means of her acoustic narration, which is often closely associated with sound-related metaphoric mappings, shifting of narrative viewpoints and so on in the fiction. By comparing various ways of construal and stylistic choices made by different English translators of Can Xue’s fiction based on the comparative and parallel corpus, it is found out that transliteration of onomatopoetic reduplications is an effective means of reconstructing iconicity in acoustic narrative, which can not only enrich foreignized sensory experiences of readers but also make them accept the language and culture of the “other” in a better way. This also reflects the translator’s creative and personalized cognitive style.

Read full abstract
  • Journal IconJournal of Literary Semantics
  • Publication Date IconApr 8, 2025
  • Author Icon Lizhu Zhang
Cite IconCite
Save

Transformer Hyperparameter Tuning for Madurese-Indonesian Machine Translation

The main problem arising in using Neural Machine Translation (NMT) for the Madurese language is the limitation of training data due to the unavailability of an adequate parallel corpus. In addition, the model must overcome the difference in words caused by the level of politeness in the Madurese language (coarse, moderate, and smooth). The rules-based approach requires many rules to represent these differences. In contrast, the statistical approach relies on the frequency of words in the training data, which cannot accurately capture variations in politeness levels. To overcome this problem, a parallel corpus was created to provide adequate training data, and an embedding matrix based on Skip Gram with Negative Sampling (SGNS) was used to produce better word representations for processing with transformers. This study also employs two types of evaluation: model configuration based on dataset size (large and small) and two tokenization methods (word and subword levels). The best results were obtained with the large dataset using word-level tokenization, achieving 0.70% accuracy for entirely correct text, 78.87% for partially correct text, and a BLEU score ranging from 4.76 to 27.63 with a maximum n-gram value from 1 to 4. This approach improved translation accuracy and shows significant potential for developing NMT systems for languages with limited resources, such as the Madurese language.

Read full abstract
  • Journal IconEngineering, Technology & Applied Science Research
  • Publication Date IconApr 3, 2025
  • Author Icon Fika Hastarita Rachman + 4
Cite IconCite
Save

A comprehensive overview of LLM-based approaches for machine translation

Statistical machine translation (SMT) used parallel corpora and statistical models, to identify translation patterns and probabilities. Although this method had advantages, it had trouble with idiomatic expressions, context-specific subtleties, and intricate linguistic structures. The subsequent introduction of deep neural networks such as recurrent neural networks (RNNs), long short-term memory (LSTMs), transformers with attention mechanisms, and the emergence of large language model (LLM) frameworks has marked a paradigm shift in machine translation in recent years and has entirely replaced the traditional statistical approaches. The LLMs are able to capture complex language patterns, semantics, and context because they have been trained on enormous volumes of text data. Our study summarizes the most significant contributions in the literature related to LLM prompting, fine-tuning, retrieval augmented generation, improved transformer variants for faster translation, multilingual LLMs, and quality estimation with LLMs. This new research direction guides the development of more efficient and innovative solutions to address the current challenges of LLMs, including hallucinations, translation bias, information leakage, and inaccuracy due to language inconsistencies.

Read full abstract
  • Journal IconIndonesian Journal of Electrical Engineering and Computer Science
  • Publication Date IconApr 1, 2025
  • Author Icon Bhuvaneswari Kumar + 1
Cite IconCite
Save

Designing AI-powered translation education tools: a framework for parallel sentence generation using SauLTC and LLMs

Translation education (TE) demands significant effort from educators due to its labor-intensive nature. Developing computational tools powered by artificial intelligence (AI) can alleviate this burden by automating repetitive tasks, allowing instructors to focus on higher-level pedagogical aspects of translation. This integration of AI has the potential to significantly enhance the efficiency and effectiveness of translation education. The development of effective AI-based tools for TE is hampered by a lack of high-quality, comprehensive datasets tailored to this specific need, especially for Arabic. While the Saudi Learner Translation Corpus (SauLTC), a unidirectional English-to-Arabic parallel corpus, constitutes a valuable resource, its current format is inadequate for generating the parallel sentences required for a didactic translation corpus. This article proposes leveraging large language models like the Generative Pre-trained Transformer (GPT) to transform SauLTC into a parallel sentence corpus. Using cosine similarity and human evaluation, we assessed the quality of the generated parallel sentences, achieving promising results with an 85.2% similarity score using Language-agnostic BERT Sentence Embedding (LaBSE) in conjunction with GPT, outperforming other investigated embedding models. The results demonstrate the potential of AI to address critical dataset challenges in quest of effective data driven solutions to support translation education.

Read full abstract
  • Journal IconPeerJ Computer Science
  • Publication Date IconMar 31, 2025
  • Author Icon Moneerh Aleedy + 7
Cite IconCite
Save

A Study on the Characteristics of Chinese Expressions Corresponding to Korean Syntactic Negation Using Parallel Corpus

A Study on the Characteristics of Chinese Expressions Corresponding to Korean Syntactic Negation Using Parallel Corpus

Read full abstract
  • Journal IconJournal of the Humanities
  • Publication Date IconMar 31, 2025
  • Author Icon Pan-Pan Shi + 1
Cite IconCite
Save

Tibyan corpus: balanced and comprehensive error coverage corpus using ChatGPT for Arabic grammatical error correction

Natural language processing (NLP) augments text data to overcome sample size constraints. Scarce and low-quality data present particular challenges when learning from these domains. Increasing the sample size is a natural and widely used strategy for alleviating these challenges. Moreover, data-augmentation techniques are commonly used in languages with rich data resources to address problems such as exposure bias. In this study, we chose Arabic to increase the sample size and correct grammatical errors. Arabic is considered one of the languages with limited resources for grammatical error correction (GEC) despite being one of the most popular among Arabs and non-Arabs because of its close connection to Islam. Therefore, this study aims to develop an Arabic corpus called “Tibyan” for grammatical error correction using ChatGPT. ChatGPT is used as a data augmenter tool based on a pair of Arabic sentences containing grammatical errors matched with a sentence free of errors extracted from Arabic books, called guide sentences. Multiple steps were involved in establishing our corpus, including collecting and pre-processing a pair of Arabic texts from various sources, such as books and open-access corpora. We then used ChatGPT to generate a parallel corpus based on the text collected previously, as a guide for generating sentences with multiple types of errors. By engaging linguistic experts to review and validate the automatically generated sentences, we ensured they were correct and error-free. The corpus was validated and refined iteratively based on feedback provided by linguistic experts to improve its accuracy. Finally, we used the Arabic Error Type Annotation tool (ARETA) to analyze the types of errors in the Tibyan corpus. Our corpus contained 49% of errors, including seven types: orthography, morphology, syntax, semantics, punctuation, merge, and split. The Tibyan corpus contains approximately 600 K tokens.

Read full abstract
  • Journal IconPeerJ Computer Science
  • Publication Date IconMar 31, 2025
  • Author Icon Ahlam Alrehili + 1
Open Access Icon Open Access
Cite IconCite
Save

The Application of Wang Yangming’s Instructions for Practical Living Multilingual Parallel Corpus in University Foreign Language Education on “Human Nature”

Please refer to the URL that includes this article for the abstract.

Read full abstract
  • Journal IconAsia-Pacific Journal of Humanities and Social Sciences
  • Publication Date IconMar 31, 2025
  • Author Icon Xiangshu Wu
Cite IconCite
Save

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • .
  • .
  • .
  • 10
  • 1
  • 2
  • 3
  • 4
  • 5

Popular topics

  • Latest Artificial Intelligence papers
  • Latest Nursing papers
  • Latest Psychology Research papers
  • Latest Sociology Research papers
  • Latest Business Research papers
  • Latest Marketing Research papers
  • Latest Social Research papers
  • Latest Education Research papers
  • Latest Accounting Research papers
  • Latest Mental Health papers
  • Latest Economics papers
  • Latest Education Research papers
  • Latest Climate Change Research papers
  • Latest Mathematics Research papers

Most cited papers

  • Most cited Artificial Intelligence papers
  • Most cited Nursing papers
  • Most cited Psychology Research papers
  • Most cited Sociology Research papers
  • Most cited Business Research papers
  • Most cited Marketing Research papers
  • Most cited Social Research papers
  • Most cited Education Research papers
  • Most cited Accounting Research papers
  • Most cited Mental Health papers
  • Most cited Economics papers
  • Most cited Education Research papers
  • Most cited Climate Change Research papers
  • Most cited Mathematics Research papers

Latest papers from journals

  • Scientific Reports latest papers
  • PLOS ONE latest papers
  • Journal of Clinical Oncology latest papers
  • Nature Communications latest papers
  • BMC Geriatrics latest papers
  • Science of The Total Environment latest papers
  • Medical Physics latest papers
  • Cureus latest papers
  • Cancer Research latest papers
  • Chemosphere latest papers
  • International Journal of Advanced Research in Science latest papers
  • Communication and Technology latest papers

Latest papers from institutions

  • Latest research from French National Centre for Scientific Research
  • Latest research from Chinese Academy of Sciences
  • Latest research from Harvard University
  • Latest research from University of Toronto
  • Latest research from University of Michigan
  • Latest research from University College London
  • Latest research from Stanford University
  • Latest research from The University of Tokyo
  • Latest research from Johns Hopkins University
  • Latest research from University of Washington
  • Latest research from University of Oxford
  • Latest research from University of Cambridge

Popular Collections

  • Research on Reduced Inequalities
  • Research on No Poverty
  • Research on Gender Equality
  • Research on Peace Justice & Strong Institutions
  • Research on Affordable & Clean Energy
  • Research on Quality Education
  • Research on Clean Water & Sanitation
  • Research on COVID-19
  • Research on Monkeypox
  • Research on Medical Specialties
  • Research on Climate Justice
Discovery logo
FacebookTwitterLinkedinInstagram

Download the FREE App

  • Play store Link
  • App store Link
  • Scan QR code to download FREE App

    Scan to download FREE App

  • Google PlayApp Store
FacebookTwitterTwitterInstagram
  • Universities & Institutions
  • Publishers
  • R Discovery PrimeNew
  • Ask R Discovery
  • Blog
  • Accessibility
  • Topics
  • Journals
  • Open Access Papers
  • Year-wise Publications
  • Recently published papers
  • Pre prints
  • Questions
  • FAQs
  • Contact us
Lead the way for us

Your insights are needed to transform us into a better research content provider for researchers.

Share your feedback here.

FacebookTwitterLinkedinInstagram
Cactus Communications logo

Copyright 2025 Cactus Communications. All rights reserved.

Privacy PolicyCookies PolicyTerms of UseCareers