Abstract
In this paper, we propose a novel approach to video action recognition that integrates a modified and optimized 3D Convolutional Neural Network, a Long Short-Term Memory network, and attention mechanisms. This synergy enhances the overall performance, offering an advantage over existing methods in managing the intricacies of real-world scenarios. The uniqueness of our approach lies in its capacity to capture both spatial and temporal information from video sequences and the incorporation of an attention mechanism that selectively emphasizes key areas within the sequences, thereby enhancing recognition accuracy. The model is particularly tailored to handle complex scenarios, such as those with multiple actors or objects, or instances of occlusion. It effectively addresses the subjectivity and variability inherent in action annotations within datasets. We also apply an array of preprocessing techniques to further optimize model performance. Through rigorous experimental evaluations on benchmark datasets, namely UCF101 and HMDB51, we demonstrate that our proposed approach significantly outperforms existing state-of-the-art methods in action recognition. These results underscore the potential of our approach for further advancements in video action recognition research.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.