Abstract

Visual systems estimate the three-dimensional (3D) structure of scenes from information in two-dimensional (2D) retinal images. Visual systems use multiple sources of information to improve the accuracy of these estimates, including statistical knowledge of the probable spatial arrangements of natural scenes. Here, we examine how 3D surface tilts are spatially related in real-world scenes, and show that humans pool information across space when estimating surface tilt in accordance with these spatial relationships. We develop a hierarchical model of surface tilt estimation that is grounded in the statistics of tilt in natural scenes and images. The model computes a global tilt estimate by pooling local tilt estimates within an adaptive spatial neighborhood. The spatial neighborhood in which local estimates are pooled changes according to the value of the local estimate at a target location. The hierarchical model provides more accurate estimates of groundtruth tilt in natural scenes and provides a better account of human performance than the local estimates. Taken together, the results imply that the human visual system pools information about surface tilt across space in accordance with natural scene statistics.

Highlights

  • Estimating three-dimensional (3D) surface orientation from two-dimensional (2D) retinal images is one of the most critical functions of human vision [1]

  • Visual systems estimate three-dimensional (3D) properties of scenes from two-dimensional images on the retinas. To solve this difficult problem as accurately as possible, visual systems use many available sources of information, including information about how the 3D properties of the world are spatially arranged. This manuscript reports a systematic analysis of 3D surface tilt in natural scenes, a model of surface tilt estimation that makes use of these scene statistics, and human psychophysical data on the estimation of surface tilt from natural images

  • The results show that the regularities present in the natural environment predict both how to maximize the accuracy of tilt estimation and how to maximize the prediction of human performance

Read more

Summary

Introduction

Estimating three-dimensional (3D) surface orientation from two-dimensional (2D) retinal images is one of the most critical functions of human vision [1]. To understand how the estimation of 3D surface orientation works in the real world, it can be useful to study performance with stimuli that are as natural as possible. The computer vision literature frequently models the use of spatial context, but infrequently provides insights into the computations that may underlie human performance [24]. There have been attempts to develop quantitative models that capture the impact of spatial context on human perception, but the stimuli that these models apply to are often rather artificial [6,25]. There have been multiple demonstrations that global context influences human perception of surface orientation in realworld 3D scenes [26], but these studies typically do not provide quantitative models that account for human performance

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call