Extracting Deep Video Feature for Mobile Video Classification with ELU-3DCNN

Jihong Liu,Jing Zhang,Xi Liang,Hui Zhang,Li Zhuo

doi:10.1007/978-981-10-8530-7_15

Abstract

Extracting robust video feature has always been a challenge in the field of video classification. Although existing researches on video feature extraction have been active and extensive, the classification results based on traditional video feature are always neither flexible nor satisfactory enough. Recently, deep learning has shown an excellent performance in video feature extraction. In this paper, we improve a deep learning architecture called ELU-3DCNN to extract deep video feature for video classification. Firstly, ELU-3DCNN is trained with exponential linear units (ELUs). Then a video is split into 16-frame clips with 8-frame overlaps between consecutive clips. These clips are passed to ELU-3DCNN to extract fc7 activations, which are further averaged and normalized to form a 4096-dim video feature. Experimental results on UCF-101 dataset show that ELU-3DCNN can improve the performance of video classification compared with the state-of-the-art video feature extraction methods.

Full Text