Abstract

This paper describes in depth the data collection and exploitation stages in constructing the undergraduate learner translator corpus (ULTC), a 75 million-word sentence-aligned bidirectional parallel corpus of Arabic, English, and French, with Arabic as its central language. We focus on the methodological challenges, and describe the compilation process and problems encountered in the first phase of the project. Our aim is to inform future compilers of similar projects that integrate learner corpus research (LCR) and corpus-based translation studies (CBTS). In the first part, we present design considerations, data collection criteria, and the exploitation of the corpus, and in the second part, we evaluate the systems we used and possible improvements

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call