Abstract

In the era of vast and continuous video content creation, manually identifying crucial events becomes a tedious and inefficient task. To address this challenge, we propose a CNN-GRU model that automatically detects and classifies significant events in videos. This model employs ResNet50 Convolutional Neural Networks (CNNs) to extract visual features from video frames, followed by Gated Recurrent Units (GRUs) for temporal modelling and event recognition. By leveraging the sequential data handling capabilities of GRUs, our model captures temporal patterns across frames. We evaluate the model's performance using accuracy and F1-score metrics on the VIRAT dataset, containing 1,555 events across 12 event classes. Our approach achieves promising results, with an event classification accuracy of 75.22%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call