Abstract

Detecting and classifying human actions in videos is one of the current challenges in visual content analysis and mining. This paper presents a method for performing a finegrained classification of sport actions using a Siamese SpatioTemporal Convolutional Neural Network (SSTCNN) model. This model takes RGB images and Optical Flow field as input data. Our first contribution is the comparison of different Optical flow methods and a study of their influence on the classification score. We also present different normalization methods for the optical flow that drastically impact results, boosting performances from 44% to 74% of accuracy. Our second contribution is the detection and classification of actions in videos performed using a sliding temporal window. It leads to a satisfying score of 81.3% over the whole dataset TTStroke–21.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call