Abstract

Segmentation of human actions is a major research problem in video understanding. A number of existing approaches demonstrate that performing action segmentation before action recognition results in better recognition performance. In this paper, we address the problem of action segmentation in an online manner. We first extend the clustering-based image segmentation approach into a temporal one, where hierarchical supervoxel levels for action segmentation are generated accordingly. We then propose a streaming approach to flatten the hierarchical levels into one based on uniform entropy slice, in order to preserve important information in the video. The flattened level contains the silhouette of a human with the structure of body parts labeled in different labels. We then combine the human structure information and the original video frames to “strengthen” the action in a video, which paves the way for accurate action recognition. The experimental results show that our online approach achieves satisfactory performance regarding action segmentation or recognition on various publicly available data sets, including the DAVIS data set, the UCF Sports data set, and the KTH data set.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call