Abstract

Existing state-of-the-art methods for salient object detection mainly achieve significant progress by integrating multi-level features. However, most of them simply resize the multi-level features to the same spatial resolution and aggregate them by channel-wise concatenation or element-wise addition. Thus, they are limited to integrating features of adjacent layers due to large discrepancy between these multi-level features. Besides, most existing methods adopt the U-shape architecture where high-level semantic information is gradually transmitted to shallower layers to suppress the background noises. Nonetheless, little attention has been paid to refine high-level features. In this paper, we propose the BINet to solve the above problems. Specifically, we design the feature interaction module (FIM) to exploit complementary information between multiple features, which is then utilized to simultaneously refine these features. Besides, we propose the cascaded bidirectional interaction decoder (CBID) to further refine multi-level features iteratively. Our CBID progressively refines multi-level features with feedback mechanisms and employs a channel attention module (CAM) to assign larger weights to channels showing higher response to salient regions. Equipped with these modules, our BINet is capable of segmenting salient regions accurately and quickly. Experimental results on six widely used benchmark datasets validate that our BINet outperforms 16 other state-of-the-art methods in terms of 7 standard evaluation metrics. Our code will be publicly available at https://github.com/clelouch/BINet.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call