Abstract

Human activity recognition in aerial videos is an emerging research area. In this paper, an Inflated I3D-ConvNet (Inflated I3D) and Bidirectional Long Short-Term Memory (Bi-LSTM) based human action recognition model in UAV videos have been proposed. The initial module was pre-trained using the Kinetics-400 video dataset, which consisted of 400 classes of human activities and around 400 video clips for each class culled from real-world and arduous YouTube videos. The proposed inflated I3D-ConvNet which was built on 2D-ConvNet inflation learns and extracts spatio-temporal features from aerial video while leveraging the architectural design of Inception-V1. The proposed model employs Bi-LSTM architecture for human action classification on the Drone-Action dataset which is a smaller benchmark UAV-captured video dataset. This model considerably improves the state-of-the-art results in activity classification using the SoftMax classifier and retains an accuracy of about 98.4%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.