Abstract

Recognizing manipulations performed by a human and the transfer and execution of this by a robot is a difficult problem. We address this in the current study by introducing a novel representation of the relations between objects at decisive time points during a manipulation. Thereby, we encode the essential changes in a visual scenery in a condensed way such that a robot can recognize and learn a manipulation without prior object knowledge. To achieve this we continuously track image segments in the video and construct a dynamic graph sequence. Topological transitions of those graphs occur whenever a spatial relation between some segments has changed in a discontinuous way and these moments are stored in a transition matrix called the semantic event chain (SEC). We demonstrate that these time points are highly descriptive for distinguishing between different manipulations. Employing simple sub-string search algorithms, SECs can be compared and type-similar manipulations can be recognized with high confidence. As the approach is generic, statistical learning can be used to find the archetypal SEC of a given manipulation class. The performance of the algorithm is demonstrated on a set of real videos showing hands manipulating various objects and performing different actions. In experiments with a robotic arm, we show that the SEC can be learned by observing human manipulations, transferred to a new scenario, and then reproduced by the machine.

Highlights

  • It has long been known that raw observation and naive copying are insufficient to execute an action by a robot

  • In this paper we have introduced a novel representation for manipulations, called the semantic event chain (SEC), which focuses on the relations between objects in a scene

  • The representation generates column vectors in a matrix where every transition between neighboring vectors can be interpreted as an action rule, which defines which object relations have changed in the scene

Read more

Summary

Introduction

It has long been known that raw observation and naive copying are insufficient to execute an action by a robot. The last two aspects (5) and (6) would allow human access, very practically, for debugging and improving the algorithm(s), and for being able to better understand and possibly interact with the artificial system and for entering model-based knowledge To arrive at such a representation is a very difficult problem and commonly one uses models of objects (and hands) and trajectories to encode a manipulation (see the section for a discussion of the relevant literature). In this study it is our goal to introduce the so-called ‘semantic event chain’ (SEC) as a novel, generic encoding scheme for manipulations, which, to a large degree, fulfills the above introduced requirements (grounded, learnable, invariant, compressed, and human-comprehensible) We show that these SECs can be used to allow an agent to learn by observation to distinguish between different manipulations and to classify parts of the observed scene. Parts of this study have been published at a conference (Aksoy et al 2010)

Related work
Recognition of manipulations
Recognition of human motion patterns
Object recognition and the role of context
Overview of the algorithm
Discussion
Related approaches
Features and problems of the algorithm
Affordances and object–action complexes
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.