Bilingual Corpus Research Articles

Approaches of query translation in Cross-Language Information Retrieval (CLIR) have frequently used dictionaries which suffer from translation ambiguity. Besides, a word-by-word query translation is not sufficient. In this paper, we propose, evaluate and compare a new possibilistic approach for query translation in order to improve the previous dictionary-based ones. This approach uses a probability-to-possibility transformation as a mean to introduce further tolerance in query translation process. Firstly, we identify noun phrases (NPs) in the source query and translate them as units using translation patterns and a language model. Secondly, source query terms which are not included in any selected NPs are translated word-by-word using our new possibilistic approach of single word translation. Indeed, we take into account all query words and their translations when we choose the suitable translation of a given word. We start from the idea that the correct suitable translations of query terms have a tendency to co-occur in the target language documents unlike unsuitable ones. Finally, to increase the coverage of the bilingual dictionary, additional words and their translations are automatically generated from a parallel bilingual corpus. We tested our approach using the French-English parallel text corpus Europarl and the CLEF-2003 French-English CLIR test collection. The reported experiments showed the performance of the probability-to-possibility transformation-based approach compared to the probabilistic one and to some state-of-the-art CLIR tools.

Read full abstract

Far from being restricted to exchanges between experts, specialised knowledge is mediated to audiences with different levels of specialization, from scientific reviews to newspaper articles. This diversity constitutes an often-overlooked challenge for translators. As a matter of fact, while documentation and terminology are always crucial, translation decisions are based on communicative parameters as well as cognitive and linguistic criteria. Although it is self-evident that linguistic choices are determined by the proficiency level of the readership, few authors have attempted to specify what those choices are and how the correlation operates, most notably in popularization discourse, and none of them has considered potential differences between languages and cultural settings. The focus of the paper is a bilingual (French and Spanish) corpus study carried out on newspaper articles dealing with stem cell research and cloning published in four different geographic regions (France, Quebec, Spain, Argentina). An original methodology was implemented for data collection and analysis. The number and nature of expressions used to convey each concept were then analyzed. Discursive strategies widely assumed to be a hallmark of popularization, like definitions and explanations, were also taken into account. Indices of metaphorical conceptualization and the underlying modes of conceptualization were identified. This study provides concrete data to a debate that remains largely theoretical, and supports the conception of specialized communication as a continuum. The results go against well-established ideas about popularized texts, specially regarding the trademark status of “didactic features.” It seems imperative to acknowledge the heterogeneity of popularization and to consider the role of textual genre constraints in the way specialized knowledge is introduced. Furthermore, the data obtained seems to substantiate the recent questioning of the canonical view of popularization as a mere translation.

Read full abstract

Bilingual Corpus Research Articles

Related Topics

Articles published on Bilingual Corpus

Bayesian Word Learning in Multiple Language Environments.

Does empirical data from bilingual and native Spanish corpora meet linguistic theory? The role of discourse context in variation of subject expression

Linguistic-Relationships-Based Approach for Improving Word Alignment

Reframing translated news for target readers: a narrative account of news translation in Snowden’s discourses

Synchrony issues in comics. Language transfer and gender-specific characterisation in English translations of Greek Aristophanic comics

Translating Low-Resource Languages by Vocabulary Adaptation from Close Counterparts

Translation Quality Estimation Using Only Bilingual Corpora

Acquiring Chinese paraphrases based on random walk of $N$ steps

La expresión lingüística de la valoración en textos jurisprudenciales: Estudio contrastivo francés-español

Neural Networks Classifier for Data Selection in Statistical Machine Translation

Cross-lingual neighborhood effects in generalized lexical decision and natural reading.

A SYNTACTIC ANALYSIS OF THE ENGLISH DISCOURSE MARKER ONLY AND ITS VIETNAMESE TRANSLATIONAL EQUIVALENTS

COMPREHENSIVE APPROACH FOR BILINGUAL MACHINE TRANSLATION

The Feasibility of Content and System Morpheme Hierarchy in the Analysis of Tamazight Bilingual Corpora: The Case of Kabyle and Mzabi Bilingual Speech in Oran

Leveraging bilingually-constrained synthetic data via multi-task neural networks for implicit discourse relation recognition

Building Laos Dependency Treebank by Means of Chinese-Laos Bilingual Corpus of Word Alignment

Towards a new possibilistic query translation tool for cross-language information retrieval

Translation and Popularization: Medical Research in the Communicative Continuum

POS-tagging a bilingual parallel corpus: methods and challenges

Chinese temporal relation resolution based on Chinese-English parallel corpus

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Bilingual Corpus Research Articles

Related Topics

Articles published on Bilingual Corpus

Bayesian Word Learning in Multiple Language Environments.

Does empirical data from bilingual and native Spanish corpora meet linguistic theory? The role of discourse context in variation of subject expression

Linguistic-Relationships-Based Approach for Improving Word Alignment

Reframing translated news for target readers: a narrative account of news translation in Snowden’s discourses

Synchrony issues in comics. Language transfer and gender-specific characterisation in English translations of Greek Aristophanic comics

Translating Low-Resource Languages by Vocabulary Adaptation from Close Counterparts

Translation Quality Estimation Using Only Bilingual Corpora

Acquiring Chinese paraphrases based on random walk of $N$ steps

La expresión lingüística de la valoración en textos jurisprudenciales: Estudio contrastivo francés-español

Neural Networks Classifier for Data Selection in Statistical Machine Translation

Cross-lingual neighborhood effects in generalized lexical decision and natural reading.

A SYNTACTIC ANALYSIS OF THE ENGLISH DISCOURSE MARKER ONLY AND ITS VIETNAMESE TRANSLATIONAL EQUIVALENTS

COMPREHENSIVE APPROACH FOR BILINGUAL MACHINE TRANSLATION

The Feasibility of Content and System Morpheme Hierarchy in the Analysis of Tamazight Bilingual Corpora: The Case of Kabyle and Mzabi Bilingual Speech in Oran

Leveraging bilingually-constrained synthetic data via multi-task neural networks for implicit discourse relation recognition

Building Laos Dependency Treebank by Means of Chinese-Laos Bilingual Corpus of Word Alignment

Towards a new possibilistic query translation tool for cross-language information retrieval

Translation and Popularization: Medical Research in the Communicative Continuum

POS-tagging a bilingual parallel corpus: methods and challenges

Chinese temporal relation resolution based on Chinese-English parallel corpus