Segmenting visual scenes into distinct objects and surfaces is a fundamental visual process, with stereoscopic depth and motion serving as crucial cues. However, how the visual system uses these cues to segment multiple objects is not fully understood. We investigated how neurons in the middle-temporal (MT) cortex of macaque monkeys represent overlapping surfaces at different depths, moving in different directions. Neuronal activity was recorded from three male monkeys during discrimination tasks under varying attention conditions. We found that neuronal responses to overlapping surfaces showed a robust bias toward the binocular disparity of one surface over the other. The disparity bias of a neuron was positively correlated with the neuron's disparity preference for a single surface. In two animals, neurons preferring near disparities of single surfaces (near neurons) showed a near bias for overlapping stimuli, while neurons preferring far disparities (far neurons) showed a far bias. In the third animal, both near and far neurons displayed a near bias, though the near neurons showed a stronger near bias. All three animals exhibited an initial near bias across neurons relative to the average of the responses to the individual surfaces. Although attention modulated neuronal responses, the disparity bias was not caused by attention. We also found that the effect of attention was consistent with object-based, rather than feature-based attention. We proposed a model in which the pool size of the neuron population that weighs the responses to individual stimulus components can be variable. This model is a novel extension of the standard normalization model and provides a unified explanation for the disparity bias across animals. Our results reveal how MT neurons encode multiple stimuli moving at different depths and present new evidence of response modulation by object-based attention. The disparity bias allows subgroups of neurons to preferentially represent individual surfaces of multiple stimuli at different depths, thereby facilitating segmentation.
Read full abstract