WikiBABEL

A Kumaran,K Saravanan,Sandor Maurice

doi:10.1145/1822258.1822277

Abstract

In this paper, we present a collaborative framework -- wikiBABEL -- for the efficient and effective creation of multilingual content by a community of users. The wikiBABEL framework leverages the availability of fairly stable content in a source language (typically, English) and a reasonable and not necessarily perfect machine translation system between the source language and a given target language, to create the rough initial content in the target language that is published in a collaborative platform. The platform provides an intuitive user interface and a set of linguistic tools for collaborative correction of the rough content by a community of users, aiding creation of clean content in the target language. We describe the architectural components implementing the wikiBABEL framework, namely, the systems for source and target language content management, mechanisms for coordination and collaboration and intuitive user interface for multilingual editing and review. Importantly, we discuss the integrated linguistic resources and tools, such as, bilingual dictionaries, machine translation and transliteration systems, etc., to help the users during the content correction and creation process. In addition, we analyze and present the prime factors -- user-interface features or linguistic tools and resources -- that significantly influence the user experiences in multilingual content creation.In addition to the creation of multilingual content, another significant motivation for the wikiBABEL framework is the creation of parallel corpora as a by-product. Parallel linguistic corpora are very valuable resources for both Statistical Machine Translation (SMT) and Crosslingual Information Retrieval (CLIR) research, and may be mined effectively from multilingual data with significant content overlap, as may be created in the wikiBABEL framework. Creation of parallel corpora by professional translators is very expensive, and hence the SMT and CLIR research have been largely confined to a handful of languages. Our attempt to engage the large and diverse Internet user population may aid creation of such linguistic resources economically, and may make computational linguistics research possible and practical in many languages of the world.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

WikiBABEL

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

OntoTag - A Linguistic and Ontological Annotation Model Suitable for the Semantic Web
A Pareja-Lora
-
A Pareja-LoraA Pareja-Lora
09 Nov 2012
09 Nov 2012

Baidu Translate: Research and Products
Zhongjun He
-
Zhongjun HeZhongjun He
01 Jan 2015
01 Jan 2015

Context-based Translation for the Out of Vocabulary Words Applied to Hindi-English Cross-Lingual Information Retrieval
Vijay Kumar Sharma ... Ankit Vidyarthi
IETE Technical Review | VOL. 39
Vijay Kumar Sharma, et. al.Vijay Kumar Sharma ... Ankit Vidyarthi
26 Nov 2020
IETE Technical Review | VOL. 39

Hybrid Arabic–French machine translation using syntactic re-ordering and morphological pre-processing
Emad Mohamed ... Fatiha Sadat
Computer Speech & Language | VOL. 32
Emad Mohamed, et. al.Emad Mohamed ... Fatiha Sadat
08 Nov 2014
Computer Speech & Language | VOL. 32

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

WikiBABEL

Abstract

Talk to us

Similar Papers