MSODANet: A network for multi-scale object detection in aerial images using hierarchical dilated convolutions

Vishnu Chalavadi,Prudviraj Jeripothula,Rajeshreddy Datla,Sobhan Babu Ch,Krishna Mohan C

doi:10.1016/j.patcog.2022.108548

Vishnu Chalavadi, Prudviraj Jeripothula + Show 3 more

https://doi.org/10.1016/j.patcog.2022.108548

Copy DOI

Abstract

The object detection in aerial images is one of the most commonly used tasks in the wide-range of computer vision applications. However, the object detection is more challenging due to the following issues: (a) the pixel occupancy vary among the different scales of objects, (b) the distribution of objects is not uniform in aerial images, (c) the appearance of an object varies with different view-points and illumination conditions, and (d) the number of objects, even though they belong to same type, vary across the images. To address these issues, we propose a novel network for multi-scale object detection in aerial images using hierarchical dilated convolutions, called as mSODANet. In particular, we probe hierarchical dilated network using parallel dilated convolutions to learn the contextual information of different types of objects at multiple scales and multiple field-of-views. The introduced hierarchical dilated network captures the visual information of aerial image more effectively and enhances the detection capability of the model. Further, the extensive experiments conducted on three challenging publicly available datasets, i.e., Visdrone2019, DOTA (OBB & HBB), NWPU VHR-10, demonstrate the effectiveness of the proposed mSODANet and achieve the state-of-the-art performance on all three datasets.

Full Text