Abstract

Object detection is one of the essential tasks in computer vision, with most detection methods relying on a limited number of sizes for anchor boxes. However, the boundaries of particular composite objects, such as ports, highways, and golf courses, are ambiguous in remote sensing images, and therefore, it is challenging for the anchor-based method to accommodate the substantial size variation of the objects. In addition, the dense placement of anchor boxes imbalances the positive and negative samples, which affects the end-to-end architecture of deep learning methods. Hence, this paper proposes a single-stage object detection model named Xnet to address this issue. The proposed method designs a deformable convolution backbone network used in the feature extraction stage. Compared to the standard convolution, it adds learnable parameters for dynamically analyzing the boundary and offset of the receptive field, rendering the model more adaptable to size variations within the same class. Moreover, this paper presents a novel anchor-free detector that classifies objects in feature images point-by-point, without relying on anchor boxes. Several experiments on the large remote sensing dataset DIOR challenging Xnet against other popular methods demonstrate that our method attains the best performance, surpassing by 4.7% on the mAP (mean average precision) metric.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call