Abstract

In this paper, we propose a novel approach to video action recognition that integrates a modified and optimized 3D Convolutional Neural Network, a Long Short-Term Memory network, and attention mechanisms. This synergy enhances the overall performance, offering an advantage over existing methods in managing the intricacies of real-world scenarios. The uniqueness of our approach lies in its capacity to capture both spatial and temporal information from video sequences and the incorporation of an attention mechanism that selectively emphasizes key areas within the sequences, thereby enhancing recognition accuracy. The model is particularly tailored to handle complex scenarios, such as those with multiple actors or objects, or instances of occlusion. It effectively addresses the subjectivity and variability inherent in action annotations within datasets. We also apply an array of preprocessing techniques to further optimize model performance. Through rigorous experimental evaluations on benchmark datasets, namely UCF101 and HMDB51, we demonstrate that our proposed approach significantly outperforms existing state-of-the-art methods in action recognition. These results underscore the potential of our approach for further advancements in video action recognition research.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call