Abstract

Human eye movement is one of the most important functions for understanding our surroundings. When a human eye processes a scene, it quickly focuses on dominant parts of the scene, commonly known as a visual saliency detection or visual attention prediction. Recently, neural networks have been used to predict visual saliency. This paper proposes a deep learning encoder-decoder architecture, based on a transfer learning technique, to predict visual saliency. In the proposed model, visual features are extracted through convolutional layers from raw images to predict visual saliency. In addition, the proposed model uses the VGG-16 network for semantic segmentation, which uses a pixel classification layer to predict the categorical label for every pixel in an input image. The proposed model is applied to several datasets, including TORONTO, MIT300, MIT1003, and DUT-OMRON, to illustrate its efficiency. The results of the proposed model are quantitatively and qualitatively compared to classic and state-of-the-art deep learning models. Using the proposed deep learning model, a global accuracy of up to 96.22% is achieved for the prediction of visual saliency.

Highlights

  • Humans have a strong ability to pay attention to a specific part of an image instead of processing the entire image

  • This study aims to propose the application of a semantic segmentation model based on the VGG-16 network to predict human visual attention in the field of view

  • We first qualitatively tested the proposed model with the SALICON dataset; we evaluated the model with the TORONTO, MIT300, MIT1003, and DUT-OMRON datasets

Read more

Summary

Introduction

Humans have a strong ability to pay attention to a specific part of an image instead of processing the entire image. This phenomenon of visual attention has been studied for over a century [1]. Top-down approaches are task-oriented and try to locate a target object from a specific category. They depend on the features of the object of interest [4,5]. Bottom-up and top-down approaches are mainly driven by the visual characteristics of a scene and the task of interest, respectively [6,7]

Objectives
Methods
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.