Abstract

Pursuing an object detector with good detection accuracy while ensuring detection speed has always been a challenging problem in object detection. This paper proposes a multi-scale context information fusion model combined with a self-attention block (CSA-Net). First, an improved backbone network ResNet-SA is designed with self-attention to reduce the interference of the image background area and focus on the object region. Second, this work introduces a receptive field feature enhancement module (RFFE) to combine local and global features while increasing the receptive field. Then this work adopts a spatial feature fusion pyramid with a symmetrical structure, which fuses and transfers semantic information and feature information. Finally, a sibling detection head using an anchor-free detection mechanism is applied to increase the accuracy and speed of detection at the end of the model. A large number of experiments support the above analysis and conclusions. Our model achieves an average accuracy of 46.8% on the COCO 2017 test set.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call