Optimal Choice of Motion Estimation Methods for Fine-Grained Action Classification with 3D Convolutional Networks

Pierre-Etienne Martin,Julien Morlier,Jenny Benois-Pineau,Renaud Peteri

doi:10.1109/icip.2019.8803780

Abstract

Detecting and classifying human actions in videos is one of the current challenges in visual content analysis and mining. This paper presents a method for performing a finegrained classification of sport actions using a Siamese SpatioTemporal Convolutional Neural Network (SSTCNN) model. This model takes RGB images and Optical Flow field as input data. Our first contribution is the comparison of different Optical flow methods and a study of their influence on the classification score. We also present different normalization methods for the optical flow that drastically impact results, boosting performances from 44% to 74% of accuracy. Our second contribution is the detection and classification of actions in videos performed using a sliding temporal window. It leads to a satisfying score of 81.3% over the whole dataset TTStroke–21.

Full Text