Abstract

Human activity recognition in aerial videos is an emerging research area. In this paper, an Inflated I3D-ConvNet (Inflated I3D) and Bidirectional Long Short-Term Memory (Bi-LSTM) based human action recognition model in UAV videos have been proposed. The initial module was pre-trained using the Kinetics-400 video dataset, which consisted of 400 classes of human activities and around 400 video clips for each class culled from real-world and arduous YouTube videos. The proposed inflated I3D-ConvNet which was built on 2D-ConvNet inflation learns and extracts spatio-temporal features from aerial video while leveraging the architectural design of Inception-V1. The proposed model employs Bi-LSTM architecture for human action classification on the Drone-Action dataset which is a smaller benchmark UAV-captured video dataset. This model considerably improves the state-of-the-art results in activity classification using the SoftMax classifier and retains an accuracy of about 98.4%.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call