A Vision Enhancement and Feature Fusion Multiscale Detection Network

Chengwu Qian,Jiangbo Qian,Chong Wang,Xulun Ye,Caiming Zhong

doi:10.1007/s11063-024-11471-w

Chengwu Qian, Jiangbo Qian + Show 3 more

Open Access

PDF Available

https://doi.org/10.1007/s11063-024-11471-w

Copy DOI

Export

Save

Cite

Journal: Neural Processing Letters	Publication Date: Feb 7, 2024
License type: CC BY 4.0

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

In the field of object detection, there is often a high level of occlusion in real scenes, which can very easily interfere with the accuracy of the detector. Currently, most detectors use a convolutional neural network (CNN) as a backbone network, but the robustness of CNNs for detection under cover is poor, and the absence of object pixels makes conventional convolution ineffective in extracting features, leading to a decrease in detection accuracy. To address these two problems, we propose VFN (A Vision Enhancement and Feature Fusion Multiscale Detection Network), which first builds a multiscale backbone network using different stages of the Swin Transformer, and then utilizes a vision enhancement module using dilated convolution to enhance the vision of feature points at different scales and address the problem of missing pixels. Finally, the feature guidance module enables features at each scale to be enhanced by fusing with each other. The total accuracy demonstrated by VFN on both the PASCAL VOC dataset and the CrowdHuman dataset is better than that of other methods, and its ability to find occluded objects is also better, demonstrating the effectiveness of our method.The code is available at https://github.com/qcw666/vfn.

Full Text