Action representation and recognition through temporal co-occurrence of flow fields and convolutional neural networks

Hatem A Rashwan,Saddam Abdulwahab,Domenec Puig,Miguel Angel Garcia

doi:10.1007/s11042-020-09194-w

Hatem A Rashwan, Saddam Abdulwahab + Show 2 more

https://doi.org/10.1007/s11042-020-09194-w

Copy DOI

Abstract

Many applications require action recognition skills, from human-machine interaction to intelligent video surveillance. Action recognition in video sequences cannot be based on simply processing raw color images or optical flow fields. Color images provide appearance information of moving objects, but lack motion features. They are also very sensitive to variations due to clothing and camera pose that badly affect the action recognition accuracy. In turn, raw optical flow measures instantaneous motion, not the overall dynamics of actions, and is sensitive to noise. More robust and meaningful motion features and classifiers are thus required for action recognition to be reliable. This paper proposes a new action recognition technique based on a deep convolutional neural network (CNN) fed with Histograms of Optical Flow Co-Occurrence (HOF-CO) motion features. HOF-CO is a robust motion representation previously proposed by the authors to encode the relative frequency of pairs of optical flow directions computed at each image pixel. Experimental results show that this approach outperforms state-of-the-art action recognition methods on three different public datasets KTH, UCF-11 Youtube and HOLLYWOOD2.

Full Text