Abstract

Temporal segmentation of human motion into actions is central to the understanding and building of computational models of human motion and activity recognition. Several issues contribute to the challenge of temporal segmentation and classification of human motion. These include the large variability in the temporal scale and periodicity of human actions, the complexity of representing articulated motion, and the exponential nature of all possible movement combinations. We provide initial results from investigating two distinct problems -classification of the overall task being performed, and the more difficult problem of classifying individual frames over time into specific actions. We explore first-person sensing through a wearable camera and inertial measurement units (IMUs) for temporally segmenting human motion into actions and performing activity classification in the context of cooking and recipe preparation in a natural environment. We present baseline results for supervised and unsupervised temporal segmentation, and recipe recognition in the CMU-multimodal activity database (CMU-MMAC).

Highlights

  • Temporal segmentation of human motion into actions is central to the understanding and building computational models of human motion and activity recognition

  • In this work we explore the use of Inertial Measurement Units (IMUs) and a first-person camera for overall task classification, action segmentation and action classification in the context of cooking and preparing recipes in an unstructured environment

  • As a first step to exploring this space, we investigate the feasibility of standard supervised and unsupervised Gaussian Mixture Models (GMMs), Hidden Markov Models (HMMs), and K-Nearest Neighbor (K-NN) techniques for action segmentation and classification on these two modalities

Read more

Summary

Introduction

Temporal segmentation of human motion into actions is central to the understanding and building computational models of human motion and activity recognition. Previous research has shown promising results, recognizing human activities and factorizing human motion into primitives and actions (i.e. temporal segmentation) is still an unsolved problem in human motion analysis. In this work we explore the use of Inertial Measurement Units (IMUs) and a first-person camera for overall task classification, action segmentation and action classification in the context of cooking and preparing recipes in an unstructured environment. This paper provides baseline results for recipe classification, action segmentation and action classification on the Carnegie Mellon University Multimodal Activity (CMUMMAC) database [6].

Previous work
Dataset
Challenges
Variability in action execution
Object recognition and scene detection
Unsupervised segmentation
Task classification from first-person vision
Action segmentation from IMU sensors
Supervised action classification
Findings
Discussion and future work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call