Abstract

Human actions being diverse in nature cannot be generalized, thus making it quite difficult to train a machine to recognize such diversified actions. This challenge is further compounded by the lack of availability of datasets for aerial surveillance, as collecting and annotating a large dataset is a formidable task. This paper aims to solve the problem of data scarcity by introducing a new dataset, Aeriform in-action for recognizing human actions from aerial videos. The proposed dataset consists of 32 high-resolution videos containing 13 action classes with 55,477 frames (without augmentation) and almost 400,000 annotations. It includes complex and aggressive actions such as kicking and punching, as well as drone signaling actions like waving and handshaking. The dataset also includes human-object interactions like carrying and reading. In addition to the dataset, this paper also presents a two-step deep learning framework for recognizing human actions based on the integration of human detection and action recognition module. The action recognition module adopts a modified version of the ResNeXt101 architecture (M-ResNext101) to recognize human actions in aerial videos. The performance of the proposed M-ResNext101 model is compared with 13 other deep learning models, and it outperforms all of them with an accuracy of 76.44% on the test data. The proposed dataset for human action recognition in aerial videos is available on https://surbhi-31.github.io/Aeriform-in-action/.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call