Abstract
• We present a novel Local to Global Feature (L2GF) Learning network for salient object detection. • We design a L-Net and a G-Net to learn local and global contexts from the low-level and high-level features, respectively. • For L-Net, it extracts coarse local context in the first stage and models the fine-grained details in the second stage. • For G-Net, the idea of modeling is from the perspective of sequence to sequence prediction to obtain global context. • We build a simple yet effective fusion branch (F-Net) to aggregate the local and global contexts for the final filtering. Existing works mainly focus on how to aggregate multi-level features for salient object detection, which may generate sub-optimal results due to interference with redundant details. To handle this problem, we aim to learn a local to global feature representation, so as to segment the detailed structures in a local perspective and locate the salient objects in a global perspective. In particular, we design a novel L2GF network which mainly consists of three modules, i.e., L-Net, G-Net, and F-Net. L-Net employs our enhanced auto-encoder structure to extract local contexts that provide rich boundary information of objects, which is able to learn rich local features in a certain receptive field. G-Net feeds the tokenized feature patches as input sequence, and leverages the well-known Transformer structure to extract global contexts which are helpful to derive the relationship between multiple salient regions and produce more complete salient results. F-Net is a coarse-to-fine process, which takes the features and maps of both local and global branches as inputs and calculate the final high-quality salient map. Extensive experiments on five benchmark datasets demonstrate that our L2GF network performs favorably against the state-of-the-art approaches.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have