Abstract
The detection performance of RGB salient object detection (SOD) needs to be improved when faced with complex backgrounds, and depth map can provide RGB map with spatial information that clearly distinguishes foreground and background. So RGB-D SOD has advantages over RGB SOD. Most existing RGB-D SOD models are built on convolutional neural networks (CNN), besides a few models are built on transformer architecture. However, convolutional operation cannot establish long-term dependency, while the detailed information of the shallow feature extracted by a transformer-based model is insufficient that may lead to limited location information of the object. In this work, a novel RGB-D SOD network combining CNN and transformer is proposed. Salient object is detected via joint learning and multi-feature fusion. The size of the proposed network model is reduced, while the feature extraction process can be more efficient. Meanwhile, the multi-feature fusion architecture can ensure exploring the commonality of features from different modalities on different scale levels. The proposed model is tested on six popular RGB-D SOD datasets, the performance is evaluated by four metrics and compared with other state-of-the-art models. Experimental results demonstrate that the proposed model outperforms its rivals with high accuracy and fine detail preserving capability.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.