Parallel Corpus of Croatian-Italian Administrative Texts

Marija Brkic Bakaric,Ivana Lalli Pacelat

doi:10.26615/issn.2683-0078.2019_002

Abstract

Parallel corpora constitute a unique re-source for providing assistance to human translators. The selection and preparation of the parallel corpora also conditions the quality of the resulting MT engine. Since Croatian is a national language and Italian is officially recognized as a minority lan-guage in seven cities and twelve munici-palities of Istria County, a large amount of parallel texts is produced on a daily basis. However, there have been no attempts in using these texts for compiling a parallel corpus. A domain-specific sentence-aligned parallel Croatian-Italian corpus of administrative texts would be of high value in creating different language tools and resources. The aim of this paper is, therefore, to explore the value of parallel documents which are publicly available mostly in pdf format and to investigate the use of automatically-built dictionaries in corpus compilation. The effects that a document format and, consequently sentence splitting, and the dictionary input have on the sentence alignment process are manually evaluated.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Parallel Corpus of Croatian-Italian Administrative Texts

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Automatic Building of a Machine Translation Bilingual Dictionary Using Recursive Chain-Link-Type Learning from a Parallel Corpus
Hiroshi Echizen-Ya ... Koji Tochinai
-
Hiroshi Echizen-Ya, et. al.Hiroshi Echizen-Ya ... Koji Tochinai
01 Jan 2004
01 Jan 2004

Sliding Window and Parallel LSTM with Attention and CNN for Sentence Alignment on Low-Resource Languages
Tien-Ping Tan ... Wan Rose Eliza Abdul Rahman
Pertanika Journal of Science and Technology | VOL. 30
Tien-Ping Tan, et. al.Tien-Ping Tan ... Wan Rose Eliza Abdul Rahman
24 Nov 2021
Pertanika Journal of Science and Technology | VOL. 30

THE TERM COMBINATION AND THE METAPHOR IN THE OFFICIAL BUSINESS DOCUMENT: COGNITIVE ASPECT
Yuliya I Demyanchuk
Alfred Nobel University Journal of Philology | VOL. 1
Yuliya I DemyanchukYuliya I Demyanchuk
30 May 2023
Alfred Nobel University Journal of Philology | VOL. 1

STAGES OF CREATING PARALLEL CORPUS OF ENGLISH-UZBEK SIMILES
-
Philology matters | VOL. -
--
25 Sep 2021
Philology matters | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Parallel Corpus of Croatian-Italian Administrative Texts

Abstract

Talk to us

Similar Papers