Action Recognition With Coarse-to-Fine Deep Feature Integration and Asynchronous Fusion

Weiyao Lin,Jianxin Wu,Bin Sheng,Ke Lu,Xin Liu,Chongyang Zhang,Hongkai Xiong,Bingbing Ni

doi:10.1609/aaai.v32i1.12232

Abstract

Action recognition is an important yet challenging task in computer vision. In this paper, we propose a novel deep-based framework for action recognition, which improves the recognition accuracy by: 1) deriving more precise features for representing actions, and 2) reducing the asynchrony between different information streams. We first introduce a coarse-to-fine network which extracts shared deep features at different action class granularities and progressively integrates them to obtain a more accurate feature representation for input actions. We further introduce an asynchronous fusion network. It fuses information from different streams by asynchronously integrating stream-wise features at different time points, hence better leveraging the complementary information in different streams. Experimental results on action recognition benchmarks demonstrate that our approach achieves the state-of-the-art performance.

Full Text