Abstract
Underwater salient object detection (USOD) has garnered increasing attention due to its superior performance in various underwater visual tasks. Despite the growing interest, research on USOD remains in its nascent stages, with existing methods often struggling to capture long-range contextual features of salient objects. Additionally, these methods frequently overlook the complementary nature of multimodal information. The multimodal information fusion can render previously indiscernible objects more detectable, as capturing complementary features from diverse source images enables a more accurate depiction of objects. In this work, we explore an innovative approach that integrates RGB and depth information, coupled with interactive feature enhancement, to advance the detection of underwater salient objects. Our method first leverages the strengths of both transformer and convolutional neural network architectures to extract features from source images. Here, we employ a two-stage training strategy designed to optimize feature fusion. Subsequently, we utilize self-attention and cross-attention mechanisms to model the correlations among the extracted features, thereby amplifying the relevant features. Finally, to fully exploit features across different network layers, we introduce a cross-scale learning strategy to facilitate multi-scale feature fusion, which improves the detection accuracy of underwater salient objects by generating both coarse and fine salient predictions. Extensive experimental evaluations demonstrate the state-of-the-art model performance of our proposed method.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.