Translational equivalence in Statistical Machine Translation or meaning as co-occurrence

Lieve Macken,Els Lefever

doi:10.52034/lanstts.v7i.215

Abstract

In this paper, we will describe the current state-of-the-art of Statistical Machine Translation (SMT), and reflect on how SMT handles meaning. Statistical Machine Translation is a corpus-based approach to MT: it de-rives the required knowledge to generate new translations from corpora. General-purpose SMT systems do not use any formal semantic representa-tion. Instead, they directly extract translationally equivalent words or word sequences – expressions with the same meaning – from bilingual parallel corpora. All statistical translation models are based on the idea of word alignment, i.e., the automatic linking of corresponding words in parallel texts. The first generation SMT systems were word-based. From a linguistic point of view, the major problem with word-based systems is that the mean-ing of a word is often ambiguous, and is determined by its context. Current state-of-the-art SMT-systems try to capture the local contextual dependen-cies by using phrases instead of words as units of translation. In order to solve more complex ambiguity problems (where a broader text scope or even domain information is needed), a Word Sense Disambiguation (WSD) module is integrated in the Machine Translation environment.

Highlights

Statistical Machine TranslationStatistical Machine Translation (SMT) is one of the best performing corpusbased approaches to natural language processing (NLP)
The phrase-based SMT systems perform significantly better than the word-based systems, they still face a lot of problems
They have reported significant improvements on the standard metrics that are used for MT evaluation by adding a disambiguation module that is integrated in a phrase-based SMT system

Summary

Introduction

Statistical Machine Translation (SMT) is one of the best performing corpusbased approaches to natural language processing (NLP). The idea of linking the meaning of a word to its context has a long history that starts with the distributional theory of meaning, which links the meaning of a word to its distribution and further states that two words are distributionally similar if they appear in similar contexts This theory of meaning goes back to Harris’ Distributional Hypothesis (Harris 1968), suggesting a direct link between distributional similarity and semantic similarity: two words that tend to occur in similar contexts tend to have similar meanings. This idea is exploited by lexicographers today, who use corpus evidence for creating dictionaries. The remainder of the paper shows how different generations of Machine Translation systems have tackled the major problems MT is confronted with

Machine Translation

Statistical Machine Translation

Word Alignment

Phrase-based statistical Machine Translation

Word Sense Disambiguation in Statistical Machine Translation

Word Sense Disambiguation

Word Sense Disambiguation approaches

Findings

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Translational equivalence in Statistical Machine Translation or meaning as co-occurrence

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Linguistica Antverpiensia, New Series – Themes in Translation Studies

Lead the way for us

Journal: Linguistica Antverpiensia, New Series – Themes in Translation Studies	Publication Date: Oct 25, 2021
License type: CC BY 4.0

Similar Papers

Statistical vs. Rule-Based Machine Translation: A Comparative Study on Indian Languages
S. Sreelekha ... D. Malathi
-
S. Sreelekha, et. al.S. Sreelekha ... D. Malathi
28 Dec 2017
28 Dec 2017

Training, Enhancing, Evaluating and Using MT Systems with Comparable Data
Bogdan Babych ... Sabine Hunsicker
-
Bogdan Babych, et. al.Bogdan Babych ... Sabine Hunsicker
01 Jan 2019
01 Jan 2019

Using Statistical Machine Translation to Grade Training Data
Andrew Finch ... Eiichiro Sumita
-
Andrew Finch, et. al.Andrew Finch ... Eiichiro Sumita
01 Dec 2008
01 Dec 2008

Improved Unsupervised Statistical Machine Translation via Unsupervised Word Sense Disambiguation for a Low-Resource and Indic Languages
Shefali Saxena ... Philemon Daniel
IETE Journal of Research | VOL. 69
Shefali Saxena, et. al.Shefali Saxena ... Philemon Daniel
23 Jul 2022
IETE Journal of Research | VOL. 69

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Translational equivalence in Statistical Machine Translation or meaning as co-occurrence

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Linguistica Antverpiensia, New Series – Themes in Translation Studies