Abstract
Recognizing and understanding human activities in real-time videos is a challenging task due to the complex nature of video data and the need for efficient and accurate analysis. This research pioneers a breakthrough in video activity recognition by introducing a robust framework leveraging the power of a stacked Bidirectional Long Short-Term Memory (Bi-LSTM) and Gated Recurrent Unit (GRU) architecture, harmonized within a fusion-based deep model. The stacked Bi-LSTM-GRU model capitalizes on its dual recurrent architecture, capturing nuanced temporal dependencies within video sequences. The fusion-based deep architecture synergizes spatial and temporal features, enabling the model to discern intricate patterns in human activities. To further enhance the discriminative power of the model, we introduce a fusion module in the proposed deep architecture. The fusion module integrates multi-modal features extracted from different levels of the network hierarchy, allowing for a more comprehensive representation of video activities. We demonstrate the efficacy of our approach through rigorous experimentation on UCF50, UCF101, and HMDB51 datasets. In experiments on the UCF50 dataset, our model achieves an accuracy of 97.01% and 95.86% on training and validation sets respectively, showcasing its proficiency in discerning activities across a diverse range of scenarios. The evaluation extends to the UCF101 dataset, where the proposed approach achieves a competitive accuracy of 97.62% and 96.93% on training and validation sets, surpassing previous benchmarks by a margin of approx 1%. Further-more, on the challenging HMDB51 dataset, the model demonstrates a robust accuracy of 89.71%and 88.88% on training and validation sets, solidifying its efficacy in intricate action recognition tasks.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.