Abstract

The optical flow of humans is well known to be useful for the analysis of human action. Recent optical flow methods focus on training deep networks to approach the problem. However, the training data used by them does not cover the domain of human motion. Therefore, we develop a dataset of multi-human optical flow and train optical flow networks on this dataset. We use a 3D model of the human body and motion capture data to synthesize realistic flow fields in both single- and multi-person images. We then train optical flow networks to estimate human flow fields from pairs of images. We demonstrate that our trained networks are more accurate than a wide range of top methods on held-out test data and that they can generalize well to real image sequences. The code, trained models and the dataset are available for research.

Highlights

  • A significant fraction of videos on the Internet contain people moving (Geman and Geman 2016) and the literature suggestsCommunicated by Ling Shao, Hubert P

  • Several datasets and benchmarks (Baker et al 2011; Geiger et al 2012; Butler et al 2012) have been established to drive the progress in optical flow. We argue that these datasets are insufficient for the task of human motion estimation and, despite its importance, no attention has been paid to datasets and models for human optical flow

  • We show that the methods trained on the Multi-Human Optical Flow dataset (MHOF) dataset outperform generic flow estimation methods for the pixels corresponding to humans

Read more

Summary

Introduction

A significant fraction of videos on the Internet contain people moving (Geman and Geman 2016) and the literature suggests. (2) We provide the Multi-Human Optical Flow dataset (MHOF), with 111, 312 frame pairs of multiple human bodies in motion, with improved textures and realistic visual occlusions, but without (self-)collisions or intersections of body meshes. Our major contributions in this extended work are: (1) We provide the Single-Human Optical Flow dataset (SHOF) of human bodies in motion with realistic textures and backgrounds, having 146, 020 frame pairs for singleperson scenarios. After masking out the background, we observe improvements of up to 13% for human pixels. (4) We provide the dataset files, dataset rendering code, training code and trained models for research purposes

Human Motion
Optical Flow Datasets
The Human Optical Flow Dataset
Body Model
Body Poses
Hand Poses
Background
Body Shapes
Hand Texture
Scene Illumination
Increasing Image Realism
Background Texture
Scene Compositing
Segmentation Masks
Rendering and Ground Truth Optical Flow
Learning
Hyperparameters
Data Augmentations
Dataset Details
Comparison on SHOF
Method
Comparison on MHOF
Real Scenes
Findings
Conclusion and Future Work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call