AdaZoom: Towards Scale-Aware Large Scene Object Detection

Jingtao Xu,Ya-Li Li,Shengjin Wang

doi:10.1109/tmm.2022.3178871

Abstract

Detection in large scenes is a challenging issue due to small objects and extreme scale variation. It is difficult for the deep-learning-based detector to extract features of small objects with only a few pixels. Most existing methods employ image pyramid and feature pyramid for multi-scale inference to alleviate this issue. However, they lack scale awareness to adapt to objects with different scales. In this paper, we propose a novel Adaptive Zoom (AdaZoom) network for scale-aware large scene object detection. There are three main contributions. First, an Adaptive Zoom network is proposed to actively focus and adaptively zoom the focused regions for high-performance object detection in large scenes. Second, to tackle the problem of missing annotations for focused regions, we train AdaZoom with the reward to measure the quality of generated regions, based on the paradigm of deep reinforcement learning. At last, we propose the collaborative training to iteratively promote the joint performance of AdaZoom and the detector. To validate the effectiveness, we conduct extensive experiments on VisDrone2019, UAVDT and DOTA datasets. The experiments show AdaZoom brings consistent and significant improvement over different detection networks, achieving state-of-the-art performance on these datasets, especially outperforming the existing methods by AP of 4.64% on VisDrone2019.

Full Text