Abstract

In deep learning, object detection has achieved a very large performance improvement. However, due to the few features available for small objects, network structure, sample imbalance and other reasons, the result is unsatisfactory in small object detection. In order to solve this problem, this paper proposes a method based on the combination of mutil-scale feature fusion and dilated convolution, which uses dilated convolution to expand the receptive field of feature maps at different scales and then extracts the high-level semantic information and low-level semantic information from the backbone network. The obtained feature maps of different receptive fields are fused to obtain the final feature map prediction information. In addition, we add a series of channel attention and spatial attention mechanisms to the network to better obtain the context information of the object in the image. Experiments show that this method can have higher accuracy than the traditional YOLOv3 network in the detection of small objects. In addition, the size of 640*640 images, we can achieve 31.5% accuracy in the detection of small objects in MS COCO2017. Compared with YOLOv5, there are 4 points of improvement.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call