Abstract

As a result of the increasing use of unmanned aerial vehicles (UAVs), large volumes of aerial videos have been produced. It is unrealistic for humans to screen such big data and understand the contents. Hence, methodological research on the automatic understanding of UAV videos is of paramount importance (Figure 1). In this article, we introduce a novel problem of event recognition in unconstrained aerial videos in the remote sensing community and present the large-scale, human-annotated Event Recognition in Aerial Videos (ERA) data set, consisting of 2,864 videos, each with a label from 25 different classes corresponding to an event unfolding for five seconds. All these videos are collected from YouTube. The ERA data set is designed to have significant intraclass variation and interclass similarity and captures dynamic events in various circumstances and at dramatically various scales. Moreover, to offer a benchmark for this task, we extensively validate existing deep networks. We expect that the ERA data set will facilitate further progress in automatic aerial video comprehension. The data set and trained models can be downloaded from https://lcmou.github.io/ERA_Dataset/.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call