Building Multilingual Language Resources in Web Localisation: A Crowdsourcing Approach

Asanka Wasala,Reinhard Schäler,Chris Exton,Jim Buckley,Ruvan Weerasinghe

doi:10.1007/978-3-642-35085-6_3

Abstract

Before User Generated Content (UGC) became widespread, the majority of web content was generated for a specific target audience and in the language of that target audience. When information was to be published in multiple languages, it was done using well-established localisation methods. With the growth in UGC there are a number of issues, which seem incompatible with the traditional model of software localisation. First and foremost, the number of content contributors has increased hugely. As a by-product of this development, we are also witnessing a large expansion in the scale and variety of the content. Consequently, the demand for traditional forms of localisation (based on existing language resources, a professional pool of translators, and localisation experts) has become unsustainable. Additionally, the requirements and nature of the type of translation are shifting as well: The more web-based communities multiply in scale, type and geographical distribution, the more varied and global their requirements are. However, the growth in UGC also presents a number of localisation opportunities. In this chapter, we investigate web-enabled collaborative construction of language resources (translation memories) using micro-crowdsourcing approaches, as a means of addressing the diversity and scale issues that arise in UGC contexts and in software systems generally. As the proposed approaches are based on the expertise of human translators, they also address many of the quality issues related to MT-based solutions. The first example we provide describes a client-server architecture (UpLoD) where individual users translate elements of an application and its documentation as they use them, in return for free access to these applications. Periodically, the elements of the system and documentation translated by the individual translators are gathered centrally and are aggregated into an integral translation of all, or parts of, the system that can then be re-distributed to the system’s users. This architecture is shown to feed into the design of a browser extension-based client-server architecture (BE-COLA) that allows for the capturing and aligning of source and target content produced by the ‘power of the crowd’. The architectural approach chosen enables collaborative, in-context, and real-time localisation of web content supported by the crowd and generation of high-quality language resources.

Full Text