Abstract

Feature-pyramid network-based models, which progressively fuse multi-scale features, have been proven highly effective in object detection. However, these models often learn multi-scale features with ambiguous boundaries, due to small objects with only a few pixels that easily lose information during top-down propagation, which makes multi-scale feature representation less effective. In this work, we propose an efficient Enhanced Semantic Feature Pyramid Network(ES-FPN), which combines semantic information at high-level with contextual information at low-level to improve multi-scale feature learning in small object detection. Specifically, the proposed network first exploits the rich semantic information in lateral connections that enables the features to be more semantic. Then, it excavates the lost information in high-level/low-res feature maps with rich contextual information in low-level/high-res. In this way, the high-level layers suffer the reduced loss of important contextual information during the progressive feature fusion that avoids object disappearance, which is useful to utilize rich semantic information in high-level. Finally, ES-FPN fuses the distributed features of each layer stage-by-stage and the final features are more semantically and better for localizing the object. Extensive experimental results over three widely used object detection benchmarks(MS COCO, VOC and Cityscapes) demonstrate that our network can accurately locate fairly complete objects with clear boundaries and outperforms previous feature pyramid-based methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call