Sentence splitting in Arabic to Spanish translation

Juan Roldán,Manuel Feria García

doi:10.1075/resla.21008.rol

Abstract

Abstract Modern Standard Arabic makes extensive use of coordination particles whereas punctuation marks are scarce and erratic, leading to long clauses. This is generally assumed to hinder Sentence Boundary Detection and to promote sentence splitting when translating from Arabic into English. Previous literature on translation from Arabic to Spanish is practically inexistent. We have tested this hypothesis regarding translation from Arabic to Spanish on a sample of 282,714 graphic words extracted from a bilingual corpus of 8,681,110 graphic words and found that each Arabic sentence yielded an average of 1.5 Spanish sentences. Furthermore, our data shows the potential impact of directionality in that sentence splitting when translating from Arabic into Spanish is 50% more frequent than from English into Arabic. We also determined statistically that five elements (wa [و], ḥaythu [حيث], kamā [كما], wa-qad [وقد], and wa-dhalika [وذلك]) are the most salient potential markers for sentence splitting in the resulting Spanish translations. Our findings should be particularly interesting for Computational Linguistics and translator training.

Full Text