Learning motion primitives and annotative texts from crowd-sourcing

Wataru Takano

doi:10.1186/s40648-014-0022-7

Abstract

Humanoidrobots are expected to be integrated into daily life, where a large variety of human actions and language expressions are observed. They need to learn the referential relations between the actions and language, and to understand the actions in the form of language in order to communicate with human partners or to make inference using language. Intensive research on imitation learning of human motions has been performed for the robots that can recognize human activity and synthesize human-like motions, and this research is subsequently extended to integration of motions and language. This research aims at developing robots that understand human actions in the form of natural language. One difficulty comes from handling a large variety of words or sentences used in daily life because it is too time-consuming for researchers to annotate human actions in various expressions. Recent development of information and communication technology gives an efficient process of crowd-sourcing where many users are available to complete a lot of simple tasks. This paper proposes a novel concept of collecting a large training dataset of motions and their descriptive sentences, and of developing an intelligent framework learning relations between the motions and sentences. This framework enables humanoid robots to understand human actions in various forms of sentences. We tested it on recognition of human daily full-body motions, and demonstrated the validity of it.

Highlights

Robots are able to understand their surroundings by relying on senses supplied by their body, which they can move to act on the environment
Research has been conducted on imitation learning [1,2], where the bodily motions of humans are projected onto the bodily motions of humanoid robots and recorded as dynamical system [3-6] and statistical model [7-10] parameters while compressing the information
This paper proposes a novel scheme of collecting a training dataset of human full-body motions and their descriptive sentences via crowdsourcing

Summary

Introduction

Robots are able to understand their surroundings by relying on senses supplied by their body, which they can move to act on the environment. Research has been conducted on imitation learning [1,2], where the bodily motions of humans are projected onto the bodily motions of humanoid robots and recorded as dynamical system [3-6] and statistical model [7-10] parameters while compressing the information. By using these models, it has become possible for robots to recognize human bodily motions and to generate their own natural humanlike motions. Indices of the motion models that are not understood by human partners intervene in the motion recognition and generation.

Objectives

Methods

Discussion

Conclusion