Abstract

Traditional salient object detection models are divided into several classes based on low-level features and contrast between pixels. In this paper, we propose a model based on a multilevel deep pyramid (MLDP), which involves fusing multiple features on different levels. Firstly, the MLDP uses the original image as the input for a VGG16 model to extract high-level features and form an initial saliency map. Next, the MLDP further extracts high-level features to form a saliency map based on a deep pyramid. Then, the MLDP obtains the salient map fused with superpixels by extracting low-level features. After that, the MLDP applies background noise filtering to the saliency map fused with superpixels in order to filter out the interference of background noise and form a saliency map based on the foreground. Lastly, the MLDP combines the saliency map fused with the superpixels with the saliency map based on the foreground, which results in the final saliency map. The MLDP is not limited to low-level features while it fuses multiple features and achieves good results when extracting salient targets. As can be seen in our experiment section, the MLDP is better than the other 7 state-of-the-art models across three different public saliency datasets. Therefore, the MLDP has superiority and wide applicability in extraction of salient targets.

Highlights

  • Visual saliency aims to extract the most significant regions and targets in a scene by simulating the human visual attention system

  • The global contrast-based salient model [4] divides the image into several small image regions, and the contrast between the small image regions is used to highlight salient targets

  • (d) Saliency map based on deep pyramid with Gaussian pyramid and VGG16 model information, ignoring the local contrast feature information and low-level feature information, the initial global saliency map cannot extract the details of salient targets

Read more

Summary

Introduction

Visual saliency aims to extract the most significant regions and targets in a scene by simulating the human visual attention system. In this model low-level features such as color, direction, and brightness are extracted at different channels This model can simulate the biological central-surrounding suppression mechanism in the human visual system, and the saliency map is obtained by multiscale feature fusion, but only low-level features are extracted. We divide the initial saliency map into six different scales to form the initial saliency map pyramid In this way, we can compare the multiscale images and extract local contrast features. For the obtained feature map, the “winner-takes-all” policy and inhibition of return are used to create the saliency map based on the deep pyramid It uses the VGG16 deep model to extract high-level features, so it has better feature extraction accuracy compared with shallow model. We creatively use the VGG16 model to extract high-level features to form an initial saliency map as the deep pyramid structure input. The superpixel segmentation is based on the characteristics of low-level features, and these low-level features are added to saliency map based on the deep pyramid to simultaneously extract low-level features

Our Approach
Experiments
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call