Wildlife Video Captioning Based on ResNet and LSTM

Abid Kapadi,Chinmay Ram Kavimandan,Sangita Chaudhari,Chinmay Sandeep Mandke

doi:10.1007/978-3-030-68291-0_28

Abstract

Wildlife videos often have elaborate dynamics, and techniques for generating video captions for wildlife clips involve both natural language processing and computer vision. Current techniques for video captioning have shown encouraging results. However, these techniques derive captions based on video frames only, ignoring audio information. In this paper we propose to create video captions with the help of both audio and visual information, in natural language. We utilize deep neural networks with convolutional and recurrent neural networks both involved. Experimental results on a corpus of wildlife clips show that fusion of audio knowledge greatly improves the efficiency of video description. These superior results are achieved using convolutional neural networks (CNN) and recurrent neural networks (RNN).

Full Text