Video action classification by deep learning

Esra Ergun,Filiz Gurkan,Bilge Gunsel,Onur Kaplan

doi:10.1109/siu.2017.7960446

Abstract

The purpose of this study is learning and classification of video activities using video color and motion information. The video activity labeling is important for many applications such as video content modeling, indexing, and quick access to content. In this study video activity recognition is performed by deep learning. In order to learn visual features of video, Convolutional Neural Network (CNN) layers and a special type of recursive networks, Long-Short Term Memory (LSTM), layers are stacked. Video sequence learning is performed by end-to-end training. Recent works on deep learning employ color end motion information together to improve learning and classification accuracy. In this study, unlike the existing models, video motion content is learned using SIFT flow vectors and motion and color features are fused for activity recognition. Performance tests performed on a commonly used benchmarking data set, UCF 101 which includes activity labeled videos from 101 action categories such as Biking, Playing Guitar, demonstrate that SIFT flow vectors allow us to model motion information more accurately than optical flow vectors and increase video motion classification performance.

Full Text