Abstract

AbstractThis paper presents a large-scale language model for daily-generated large-size text corpora using Hadoop in a cloud environment for improving the performance of a human–robot interaction system. Our large-scale trigram language model, consisting of 800 million trigram counts, was successfully implemented through a new approach using a representative cloud service (Amazon EC2), and a representative distributed processing framework (Hadoop). We performed trigram count extraction using Hadoop MapReduce to adapt our large-scale language model. Three hours are estimated on six servers to extract trigram counts for a large text corpus of 200 million word Twitter texts, which is the approximate number of daily-generated Twitter texts.KeywordsLanguage modelLarge-scaleCloudHuman–robot interaction

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call