Abstract

Camera-equipped UAVs, or drones, are increasingly employed in a wide range of applications. Thus, ensuring their safe flight in areas containing people is a top priority. In this paper, a deep neural network-based method is proposed for the task of visual human crowd detection from UAV footage, allowing a drone to rapidly extract semantic segmentation maps from captured video frames during flight. These maps can be exploited (e.g., by a path planner) to define no-fly zones over, or near human crowds and, hence, enhance UAV flight safety. To this end, a novel neural architecture for binary (crowd/non- crowd) semantic segmentation from single RGB images is proposed, based on Convolutional Neural Networks (CNNs). It consists of a semantic segmentation and an image-to-image translation (I2I) neural branch. The overall network is trained using a novel multi-task loss function that addresses both tasks by processing the output of the corresponding branch. During inference, information flows across branches through additional skip synapses to further assist the crowd detection task. In order to evaluate the proposed method, we introduce a real and a synthetic human crowd RGB image dataset. The proposed method outperforms previous aerial crowd detection methods by a large margin and without any post-processing. Moreover, it demonstrates increased generalization ability, while running at real-time and near-real-time speeds on a ground computer and on embedded AI hardware, respectively.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call