Abstract

This paper presents a novel approach to learning of relations among motions, objects, and language, and to generating sentences that describe human actions. Our approach categorizes human motions and the objects acted on those motions, and subsequently integrates the motion categories and object categories with their descriptive sentences. The integration consists of two steps. The first step stochastically learns the relations among the motions, objects, and words in the sentences. The second step stochastically learns the order of words in the sentences as the sentence structures. The model derived in the first step is referred to as “action language” model and that derived in the second step as “natural language” model. This framework for integrating an action language model with a natural language model can be applied to generating descriptive sentences from human actions, where each action is recognized as a pair containing a motion category and an object category, the words relevant to the action are generated via the contained motion and object categories, and the words to be arranged result in a descriptive sentence. More theoretically, our approach searches for multiple words likely to be generated from the motion and object categories by using the action language model; and subsequently searches for a sequence of these words that is likely to be generated from the obtained words, using the natural language model. We tested our proposed approach for sentence generation by applying it to human action data captured by an RGB-D sensor, and demonstrated its validity.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call