Rethinking the Role of Top-Down Attention in Vision: Effects Attributable to a Lossy Representation in Peripheral Vision

Ruth Rosenholtz,Krista A. Ehinger,Jie Huang

doi:10.3389/fpsyg.2012.00013

Abstract

According to common wisdom in the field of visual perception, top-down selective attention is required in order to bind features into objects. In this view, even simple tasks, such as distinguishing a rotated T from a rotated L, require selective attention since they require feature binding. Selective attention, in turn, is commonly conceived as involving volition, intention, and at least implicitly, awareness. There is something non-intuitive about the notion that we might need so expensive (and possibly human) a resource as conscious awareness in order to perform so basic a function as perception. In fact, we can carry out complex sensorimotor tasks, seemingly in the near absence of awareness or volitional shifts of attention (“zombie behaviors”). More generally, the tight association between attention and awareness, and the presumed role of attention on perception, is problematic. We propose that under normal viewing conditions, the main processes of feature binding and perception proceed largely independently of top-down selective attention. Recent work suggests that there is a significant loss of information in early stages of visual processing, especially in the periphery. In particular, our texture tiling model (TTM) represents images in terms of a fixed set of “texture” statistics computed over local pooling regions that tile the visual input. We argue that this lossy representation produces the perceptual ambiguities that have previously been as ascribed to a lack of feature binding in the absence of selective attention. At the same time, the TTM representation is sufficiently rich to explain performance in such complex tasks as scene gist recognition, pop-out target search, and navigation. A number of phenomena that have previously been explained in terms of voluntary attention can be explained more parsimoniously with the TTM. In this model, peripheral vision introduces a specific kind of information loss, and the information available to an observer varies greatly depending upon shifts of the point of gaze (which usually occur without awareness). The available information, in turn, provides a key determinant of the visual system’s capabilities and deficiencies. This scheme dissociates basic perceptual operations, such as feature binding, from both top-down attention and conscious awareness.

Highlights

Our senses gather copious amounts of data, seemingly far more than our minds can fully process at once
If early visual representation is in terms of a fixed set of summary statistics, computed over pooling regions that grow with eccentricity, for typical search displays many of those pooling regions will contain more than a single item
We have suggested that these results can be explained more by a newer model of the processing in early vision, in which the visual system represents its inputs by a rich set of summary statistics

Summary

Introduction

Our senses gather copious amounts of data, seemingly far more than our minds can fully process at once. Our conscious experience when looking at a street scene (e.g., Figure 1C) consists of first noticing, perhaps, a one-way sign, a pedestrian, a tree next to the sidewalk. It seems as if we switch our awareness between them. The mechanism behind this experience of shifting the focus of our awareness has been called selective attention. Selective attention has been intimately linked with conscious awareness. The precise relationship between consciousness and attention has remained unclear Selective attention has been intimately linked with conscious awareness. James (1890) said of attention that “focalization, concentration, of consciousness are of its essence.” the precise relationship between consciousness and attention has remained unclear

Methods

Results

Conclusion