ERA: A Data Set and Deep Learning Benchmark for Event Recognition in Aerial Videos [Software and Data Sets

Lichao Mou,Yuansheng Hua,Pu Jin,Xiao Xiang Zhu

doi:10.1109/mgrs.2020.3005751

Lichao Mou, Yuansheng Hua + Show 2 more

Open Access

https://doi.org/10.1109/mgrs.2020.3005751

Copy DOI

Abstract

As a result of the increasing use of unmanned aerial vehicles (UAVs), large volumes of aerial videos have been produced. It is unrealistic for humans to screen such big data and understand the contents. Hence, methodological research on the automatic understanding of UAV videos is of paramount importance (Figure 1). In this article, we introduce a novel problem of event recognition in unconstrained aerial videos in the remote sensing community and present the large-scale, human-annotated Event Recognition in Aerial Videos (ERA) data set, consisting of 2,864 videos, each with a label from 25 different classes corresponding to an event unfolding for five seconds. All these videos are collected from YouTube. The ERA data set is designed to have significant intraclass variation and interclass similarity and captures dynamic events in various circumstances and at dramatically various scales. Moreover, to offer a benchmark for this task, we extensively validate existing deep networks. We expect that the ERA data set will facilitate further progress in automatic aerial video comprehension. The data set and trained models can be downloaded from https://lcmou.github.io/ERA_Dataset/.

Full Text