Abstract

There are more than 7000 languages around the world. However, 95 % of the world population speak only 5 % of them, at most 400 languages. More than half of them have fewer than 10,000 speakers. In 2010, UNESCO released a list of 2,464 endangered languages. In Indonesia, 144 languages are endangered. To preserve and increase the use of those languages, we started the Indonesia Language Sphere project. The purpose of this project is to develop comprehensive sets of bilingual dictionaries for Indonesian ethnic languages. To this end, we propose a generalized bilingual lexicon induction method that combines pairs of existing dictionaries. Furthermore, to reduce the total cost of bilingual dictionary creation, we combine the machine and manual creation processes and construct a planner that optimizes creation orders. This paper introduces the proposed methods and reports a preliminary experiment result focusing on Indonesian, Malay, Javanese, Sundanese, and Minangkabau.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call