Innovations in Parallel Corpus Search Tools

Martin Volk ,Switzerland Johannes Graenuniversity Of Zurich Zürich ,Elena Callegaro

doi:10.5167/uzh-97282

Abstract

Recent years have seen an increased interest in and availability of parallel corpora. Large corpora from international organizations (e.g. European Union, United Nations, European Patent Office), or from multilingual Internet sites (e.g. OpenSubtitles) are now easily available and are used for statistical machine translation but also for online search by different user groups. This paper gives an overview of different usages and different types of search systems. In the past, parallel corpus search systems were based on sentence-aligned corpora. We argue that automatic word alignment allows for major innovations in searching parallel corpora. Some online query systems already employ word alignment for sorting translation variants, but none supports the full query functionality that has been developed for parallel treebanks. We propose to develop such a system for efficiently searching large parallel corpora with a powerful query language.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Innovations in Parallel Corpus Search Tools

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Discourse-level Features for Statistical Machine Translation

-

01 Jan 2015
01 Jan 2015

Multilingual Neural Translation

-

14 Feb 2020
14 Feb 2020

An unsupervised boosting technique for refiningword alignment
Sankaranarayanan Ananthakrishnan ... Prem Natarajan
-
Sankaranarayanan Ananthakrishnan, et. al.Sankaranarayanan Ananthakrishnan ... Prem Natarajan
01 Dec 2010
01 Dec 2010

Adaptation in Statistical Machine Translation for Low-resource Domains in English-Vietnamese Language
Nghia-Luan Pham ... Van-Vinh Nguyen
VNU Journal of Science: Computer Science and Communication Engineering | VOL. 36
Nghia-Luan Pham, et. al.Nghia-Luan Pham ... Van-Vinh Nguyen
30 May 2020
VNU Journal of Science: Computer Science and Communication Engineering | VOL. 36

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Innovations in Parallel Corpus Search Tools

Abstract

Talk to us

Similar Papers