Abstract

Object detection has been a challenging task due to the complexity and diversity of objects. The emergence of self-attention mechanism provides a new clue for feature fusion in object detection task. Most existing self-attention mechanisms focus on extracting the correlation between global and local information in space or among channels, however it remains problematic issues of how to effectively fuse all those features. To address the above problems, we propose a Pooling and Global feature Fusion Self-attention Mechanism (PGFSM) to capture multi-level correlations among a variety of features, so as to perform cascaded aggregations upon them. PGFSM consists of three parts: Spatial Self-attention Pooling Fusion Module (SSPFM), Channel Self-attention Pooling Fusion Module (CSPFM), and Spatial and Channel Global Self-attention Fusion Module (SCGSFM). SSPFM and CSPFM respectively carried out in space and channel, extract the global maximum pooling and global average pooling self-attention features; SCGSFM extracts the spatial and channel fused characteristic relationship in the global. Finally, the three fused feature relations are added on the original feature to achieve an enhanced trait representation. In test, our PGFSM is embedded into YOLOv4, YOLOv5, and EfficientDet network respectively, and evaluated in PASCAL VOC and MS COCO datasets. The experiment results show that the feature fusion self-attention mechanism improves the performance of object detection compared to each original framework and also the state-of-the-art modules, which proves the effectiveness of our method.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call