Abstract

This paper discusses the problem of recognizing interaction-level human activities from a first-person viewpoint. The goal is to enable an observer (e.g., a robot or a wearable camera) to understand `what activity others are performing to it' from continuous video inputs. These include friendly interactions such as `a person hugging the observer' as well as hostile interactions like `punching the observer' or `throwing objects at the observer', whose videos involve a large amount of camera ego-motion caused by physical interactions. The paper investigates multi-channel kernels to integrate global and local motion information, and presents a new activity learning/recognition methodology that explicitly considers temporal structures displayed in first-person activity videos. Furthermore, we present a novel algorithm for early recognition (i.e., prediction) of activities from first-person videos, which allows us to infer ongoing activities at their early stage. In our experiments, we not only show classification results with segmented videos, but also confirm that our new approach is able to detect activities from continuous videos and perform early recognition reliably.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call