Abstract

Due to their inherent characteristics, small objects have weaker feature representation after multiple down-sampling and are even annihilated in the background. FPN’s simple feature concatenation does not fully utilize multi-scale information and introduces irrelevant context into the information transfer, further reducing the detection performance of the small object. To address the above issues, we propose the simple but effective FE-YOLOv5. (1) We designed the feature enhancement module (FEM) to capture more discriminative features of the small object. Global attention and high-level global contextual information are used to guide shallow, high-resolution features. Global attention interacts with cross-dimensional feature interaction and reduces information loss. High-level context complements more detailed semantic information by modeling global relationships through non-local networks. (2) We design the spatially aware module (SAM) to filter spatial information and enhance the robustness of features. Deformable convolution performs sparse sampling and adaptive spatial learning to better focus on foreground objects. According to the experimental results, our proposed FE-YOLOv5 outperforms the other architectures in the VisDrone2019 dataset and Tsinghua-Tencent100K dataset. Compared to YOLOv5, the APS was improved by 2.8% and 2.9%, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call