Abstract

Accurate detection of the multiple classes in aerial images has become possible with the use of anchor-based object detectors. However, anchor-based object detectors place a large number of preset anchors on images and regress the target bounding box while anchor-free object detections predict the location of objects directly and avoid the carefully predefined anchor box parameters. Object detection in aerial images is faced with two main challenges: 1) the scale diversity of the geospatial objects; and 2) the cluttered background in complex scenes. In this letter, to address these challenges, we present a novel Anchor-Free Network with a Density map and attention mechanism (DA <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> FNet). Considering the extreme density variations of the detection instances among the different categories in aerial images, the proposed DA <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> FNet model conducts density map estimation with image-level supervision for the geospatial object counting, to acquire global knowledge about the scale information. A simple and effective image-level global counting loss function is also introduced. In addition, a compositional attention network is further introduced to enhance the saliency of the foreground objects. The proposed DA <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> FNet method was compared with the state-of-the-art object detection models, achieving excellent performance on the NWPU VHR-10, RSOD, and DOTA datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call