Abstract

Unconstrained videos, captured by inexperienced amateurs, suffer from wide-ranging quality aspects such as poor illumination, considerable camera motion, cluttered background, occlusion and substantial viewpoint variations. Event recognition in these videos is a challenging task in the field of computer vision, serving as potential solution to several practical applications. In this paper, we propose a novel approach for recognizing events in the unconstrained videos that relies on the temporal features of the representative frames (key-frames) in each video. This graph-based key-frame extraction technique exploits the inter-frame temporal variation for the entire video and reduces temporal redundancy amongst consecutive frames. The problem is thereby formulated as a constrained optimization problem, where the temporal distinctness as well as the distance of demarcation amongst the chosen frames is maximized simultaneously. For classification, we propose a model that fuses a deep residual network with the Recurrent Neural Network (RNN) model of Long Short-Term Memory (LSTM) to capture long-term temporal motion patterns. Experimental results reveal better outcome than most of the state-of-the-art methods based on test data from Columbia Consumer Video (CCV), Kodak’s consumer video and UCF-101 benchmark datasets. The videos having high content diversity, pose as classification challenges to the research.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call