Abstract
Object detection is one of the essential tasks in computer vision, with most detection methods relying on a limited number of sizes for anchor boxes. However, the boundaries of particular composite objects, such as ports, highways, and golf courses, are ambiguous in remote sensing images, and therefore, it is challenging for the anchor-based method to accommodate the substantial size variation of the objects. In addition, the dense placement of anchor boxes imbalances the positive and negative samples, which affects the end-to-end architecture of deep learning methods. Hence, this paper proposes a single-stage object detection model named Xnet to address this issue. The proposed method designs a deformable convolution backbone network used in the feature extraction stage. Compared to the standard convolution, it adds learnable parameters for dynamically analyzing the boundary and offset of the receptive field, rendering the model more adaptable to size variations within the same class. Moreover, this paper presents a novel anchor-free detector that classifies objects in feature images point-by-point, without relying on anchor boxes. Several experiments on the large remote sensing dataset DIOR challenging Xnet against other popular methods demonstrate that our method attains the best performance, surpassing by 4.7% on the mAP (mean average precision) metric.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.