Abstract
In the field of studying scale variation, the Feature Pyramid Network (FPN) replaces the image pyramid and has become one of the most popular object detection methods for detecting multi-scale objects. State-of-the-art methods have FPN inserted into a pipeline between the backbone and the detection head to enable shallow features with more semantic information. However, FPN is insufficient for object detection on various scales, especially for small-scale object detection. One of the reasons is that the features are extracted at different network depths, which introduces gaps between features. That is, as the network becomes deeper and deeper, the high-level features have more semantics but less content description. This paper proposes a new method that includes a multi-scale receptive fields extraction module, a feature constructor module, and an attention module to improve the detection efficiency of FPN for objects of various scales and to bridge the gap in content description and semantics between different layers. Together, these three modules make the detector capable of selecting the most suitable feature for objects. Especially for the attention module, this paper chooses to use a parallel structure to simultaneously extract channel and spatial attention from the same features. When we use Adopting Adaptive Training Sample Selection (ATSS) and FreeAnchor as the baseline and ResNet50 as the backbone, the experimental results on the MS COCO dataset show that our algorithm can enhance the mean average precision (mAP) by 3.7% and 2.4% compared to FPN, respectively.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have