Abstract

To cope with the challenge of significant target detection in complex scenes, this study proposes an RGB-T significant target detection method called CMFF. The method utilizes the complete potential of RGB and thermal infrared modal images and employs a codec structure and cross-modal multiscale feature fusion techniques. In the coding stage, two VGG16 backbone networks are used for multi-level feature extraction and CBAM attention module feature enhancement, and the enhanced features are fused using a stepwise fusion approach. Meanwhile, the weights of the two modalities are assigned using the L1-parametric fusion strategy to enhance the complementarity between them. In the decoding stage, global features are extracted from the high-level fused features by introducing the pyramid pooling module (PPM), and the low-level fused features are fused with multi-scale features in the up-sampling and encoding stages to enrich the global and local information of the feature map. Finally, this study conducted comparison experiments on the publicly available VT5000 dataset, and the method achieved an F-measure value of 0.863 and a mean absolute error (MAE) of 0.062, which significantly improved the overall detection performance relative to the six existing methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call