Abstract
A biologically inspired model architecture for inferring 3D shape from texture is proposed. The model is hierarchically organized into modules roughly corresponding to visual cortical areas in the ventral stream. Initial orientation selective filtering decomposes the input into low-level orientation and spatial frequency representations. Grouping of spatially anisotropic orientation responses builds sketch-like representations of surface shape. Gradients in orientation fields and subsequent integration infers local surface geometry and globally consistent 3D depth. From the distributions in orientation responses summed in frequency, an estimate of the tilt and slant of the local surface can be obtained. The model suggests how 3D shape can be inferred from texture patterns and their image appearance in a hierarchically organized processing cascade along the cortical ventral stream. The proposed model integrates oriented texture gradient information that is encoded in distributed maps of orientation-frequency representations. The texture energy gradient information is defined by changes in the grouped summed normalized orientation-frequency response activity extracted from the textured object image. This activity is integrated by directed fields to generate a 3D shape representation of a complex object with depth ordering proportional to the fields output, with higher activity denoting larger distance in relative depth away from the viewer.
Highlights
The construction of a neural representation of the 3D shape structure of an object from the monocular 2D information available from the retinal image, is one of the challenging tasks of biological visual systems
In order to demonstrate the functionality of the new proposed model architecture we show below several results
The model produces two types of final output results: A 2D sketch showing object boundaries and occlusion borders; and a 3D mesh with values indicating the relative depth of the surface location, with low activity response corresponding to the surface being closer to the viewer and farther for higher values
Summary
The construction of a neural representation of the 3D shape structure of an object from the monocular 2D information available from the retinal image, is one of the challenging tasks of biological visual systems. The representation of depth structure can be computed from various visual cues such as binocular disparity, kinetic motion and texture gradients. Depth related information can be extracted from a single monocular image from distortions caused on the texture by the object structure and distance of the surface from the camera. The causes for the distortions vary from the type of material of the object, the type of projection, depth differences and the slant and tilt of the surface region. The visual system uses neural sensitivity to gradients of such distortions present in the distribution of neural responses
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have