Abstract

RetinaNet can perform the detection task well compared with other one-stage target detection models, but it still has problems such as insufficient extraction and fusion of features from different stages, inaccurate bounding box regression and slow convergence of the network. For the above problems, an improved RetinaNet target detection model is proposed. The specific implementation is: Firstly, introduce Mosaic data enhancement to enrich the image background and enhance the training speed, in addition to this, add a Spatial Transformer Network (STN) to the feature extraction network; Secondly, change the original feature pyramid network to a Multi-Scale Feature Fusion network(MSF), and add Atrous Spatial Pyramid Pooling (ASPP) structures to adequately fusion of multi-scale semantic and location information, which increase the perceptual field; Finally, the original bounding box regression loss is changed to F-EIOU loss. This improved RetinaNet target detection model is adequately done on MS COCO and PASCAL VOC datasets with extensive experiments. Compared with RetinaNet, proposed target detection model improves the detection accuracy by 1.2% on MS COCO dataset and the average accuracy by 3.1% on PASCAL VOC dataset in the experiments.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call