Abstract

Literary works are becoming increasingly available in electronic formats, thus quickly transforming editorial processes and reading habits. In the context of the global enthusiasm for multilingualism, the rapid spread of e-book readers, such as Amazon Kindle® or Kobo Touch®, fosters the development of a new generation of reading tools for bilingual books. In particular, literary works, when available in several languages, offer an attractive perspective for self-development or everyday leisure reading, but also for activities such as language learning, translation or literary studies. An important issue in the automatic processing of multilingual e-books is the alignment between textual units. Alignment could help identify corresponding text units in different languages, which would be particularly beneficial to bilingual readers and translation professionals. Computing automatic alignments for literary works, however, is a task more challenging than in the case of better behaved corpora such as parliamentary proceedings or technical manuals. In this paper, we revisit the problem of computing high-quality alignment for literary works. We first perform a large-scale evaluation of automatic alignment for literary texts, which provides a fair assessment of the actual difficulty of this task. We then introduce a two-pass approach, based on a maximum entropy model. Experimental results for novels available in English and French or in English and Spanish demonstrate the effectiveness of our method.

Highlights

  • In the digital era, more and more books are becoming available in electronic form

  • We have used two sets of manually aligned literary works: one is an extract of the BAF corpus (Simard, 1998), consisting of one book by Jules Verne, De la terre à la lune; the other has been developed for a preliminary study described by Yu et al (2012a), and is made up of four novels translated from French into English and three from English into French

  • This paper has presented a large-scale study of sentence alignment using a small corpus of reference alignments, and two large corpora containing dozens of coarsely aligned copyright-free novels for English-Spanish and English-French language pairs

Read more

Summary

Introduction

More and more books are becoming available in electronic form. Used devices, such as Amazon Kindle R and Kobo Touch R , have made e-books an accepted reading option for the general public. Works of fiction account for a major part of the e-book market.. Global economic and cultural exchange facilitates the dissemination of literature, and many works of fiction nowadays target an international audience. Successful books are pre-sold and translated very rapidly to reach the largest possible readership.. Multiple versions of e-books constitute a highly valuable resource for a number of uses, such as language learning (Kraif and Tutin, 2011) or translation studies Successful books are pre-sold and translated very rapidly to reach the largest possible readership. Multiple versions of e-books constitute a highly valuable resource for a number of uses, such as language learning (Kraif and Tutin, 2011) or translation studies

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.