Abstract

The multi-layer feature pyramid structure, represented by FPN, is widely used in object detection. However, due to the aliasing effect brought by up-sampling, the current feature pyramid structure still has defects, such as loss of high-level feature information and weakening of low-level small object features. In this paper, we propose FI-FPN to solve these problems, which is mainly composed of a multi-receptive field fusion (MRF) module, contextual information filtering (CIF) module, and efficient semantic information fusion (ESF) module. Particularly, MRF stacks dilated convolutional layers and max-pooling layers to obtain receptive fields of different scales, reducing the information loss of high-level features; CIF introduces a channel attention mechanism, and the channel attention weights are reassigned; ESF introduces channel concatenation instead of element-wise operation for bottom-up feature fusion and alleviating aliasing effects, facilitating efficient information flow. Experiments show that under the ResNet50 backbone, our method improves the performance of Faster RCNN and RetinaNet by 3.5 and 4.6 mAP, respectively. Our method has competitive performance compared to other advanced methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call