The main challenge for object detection in aerial images is small object detection. Most existing methods use feature fusion strategies to enhance small object features in shallow layers but ignore the problem of inconsistent small object local region responses between feature layers, namely the semantic gap, which may lead to underutilization of small object information in multiple feature layers. To lift the above limitations, we propose a scale enhancement module that adaptively passes valuable small object features in different feature layers to shallow layers to alleviate the semantic gap problem. In particular, the module includes the novel fine-coarse self-attention mechanism, which captures global contextual information by performing strong interaction of pixel-level information at the local scale and weak interaction of region-level information at the global scale. In addition, the anchor assignment strategy based on the Intersection over Union (IoU) metric is not favorable for small objects as the IoU metric for small objects has a lower tolerance for position deviation compared to large ones. For this reason, we design the dynamic anchor assignment strategy with a scale-insensitive metric to assign adequate anchors to small objects. Extensive experiments on three aerial datasets demonstrate the effectiveness and adaptability of our method.
Read full abstract