Abstract

Hierarchical models have shown their effectiveness for action recognition. However, most of the existing hierarchy construction methods fail to model the complex motion patterns in videos, and thus are vulnerable to the interference of the inevitable noise in action videos. Therefore, we propose a Dynamic Hierarchical Tree (DHT) model to characterize such complex motion for better recognition performance. First, a minimum maximum DTW (mmDTW) is developed to produce more stable atomic actions by limiting the minimum and maximum lengths of atomic actions. Then an aggregation method is utilized to construct a DHT for each video by merging atomic actions from bottom to top. Not only the similarity between frames but also the compatibility of dynamic evolution between frames and segments is exploited for the mmDTW and the aggregation process, making the DHTs more suitable for modeling actions in videos. Finally, a k-Nearest Neighbors Edge Pairs (kNNEP) kernel is proposed to compare two DHTs by using the mean similarity of k nearest neighbors edge pairs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call