Implementation of a Large-Scale Language Model in a Cloud Environment for Human–Robot Interaction

Myoung-Wan Koo,Hyuk-Jun Lee,Sung-Yong Park,Jeong-sik Park,Hyung-Bae Jeon,Dae-Young Jung,Ji-Hwan Kim,Yun-Keun Lee

doi:10.1007/978-94-007-6996-0_101

Abstract

AbstractThis paper presents a large-scale language model for daily-generated large-size text corpora using Hadoop in a cloud environment for improving the performance of a human–robot interaction system. Our large-scale trigram language model, consisting of 800 million trigram counts, was successfully implemented through a new approach using a representative cloud service (Amazon EC2), and a representative distributed processing framework (Hadoop). We performed trigram count extraction using Hadoop MapReduce to adapt our large-scale language model. Three hours are estimated on six servers to extract trigram counts for a large text corpus of 200 million word Twitter texts, which is the approximate number of daily-generated Twitter texts.KeywordsLanguage modelLarge-scaleCloudHuman–robot interaction

Full Text