VideoToVecs: a new video representation based on deep learning techniques for video classification and clustering

Zein Al Abidin Ibrahim,Marwa Saab,Ihab Sbeity

doi:10.1007/s42452-019-0573-6

Abstract

With the recent revolution in the field of multimedia technology, video data have become much easier and straightforward to be created, stored and transferred on a huge scale with small costs. The big amount of created data pushed the research community to delve into various study areas to aid the huge proliferation of multimedia content such as video structuring, video classification and clustering, events and objects detection, video recommendation and many other video content analysis techniques. The key success of any analysis technique relies on the audiovisual features extracted from the video. Motivated by the appearance and efficiency of deep learning techniques, we propose in this paper a new deep-learning-based features representation of videos. We depend on image-based features extracted from the sequence of frames in the video using deep learning techniques. A mapping approach named VideoToVecs is then applied to transform the extracted features into a matrix in which each row contains features of the same type. This matrix is named deep features video matrix. The efficiency of the representation is tested on 5261-video dataset for classification and clustering, and the obtained results were very promising as we will see in the paper.

Full Text