Abstract

Inconsistent detection performance for objects of different scales lies in many state‐of‐the‐art object detection models. The feature pyramid network (FPN) alleviates this problem by fusing multi‐scale feature maps through a top‐down path. However, the features fusion strategy used in FPN lacks learning ability, which may result in suboptimal performance of the model. In this study, the authors propose a cross‐scale feature fusion network (CSFF) to fuse the low‐level location feature maps with the high‐level semantic feature maps. The CSFF first embeds a dilated convolution and deconvolution layer into the top‐down path of the FPN to enhance the learning ability of feature fusion. After that, an attention module is applied to suppress distraction and interference in the feature map. Each component of the CSFF is highly decoupled and can easily cooperate with a base network in an end‐to‐end training manner. In this study, they combine the CSFF with faster region with convolutional neural network and conduct a series of experiments on the PASCAL VOC 2007 and 2012 object detection datasets. Without any bells and whistles, the CSFF achieves a considerable detection improvement over the baseline network.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call