First-Feed LSTM model for video description

Wang Yue,Wang Xiaojie,Mao Yuzhao

doi:10.1016/s1005-8885(16)60037-7

First-Feed LSTM model for video description

Wang Yue, Wang Xiaojie + Show 1 more

https://doi.org/10.1016/s1005-8885(16)60037-7

Copy DOI

Journal: The Journal of China Universities of Posts and Telecommunications	Publication Date: Jun 1, 2016
Citations: 3

Affiliation: Beijing University of Posts and Telecommunications

#Image Caption Model #Long Short-term Memory + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

Video description (VD) aims to automatically generate descriptive natural language for videos. With its successful implementations and a broad range of applications, lots of work based on deep neural network (DNN) models have been put forward by researchers. This paper takes inspiration from an image caption model and develops an end-to-end VD model using long short-term memory (LSTM). Single video feature is fed to the first unit of LSTM decoder, and subsequent words of sentence are generated on previous predicted words. Experimental results on two publicly available datasets demonstrate that the performance of the proposed model outperforms that of baseline.

Full Text