WWN-8: Incremental Online Stereo with Shape-from-X Using Life-Long Big Data from Multiple Modalities

Mojtaba Solgi,Juyang Weng

doi:10.1016/j.procs.2015.07.309

Abstract

Abstract When a child lives in the real world, from infancy to adulthood, his retinae receive a flood of stereo sensory stream. His muscles produce another action stream. How does the child's brain deal with such big data from multiple sensory modalities (left- and right-eye modalities) and multiple effector modalities (location, disparity map, and shape type)? This capability incrementally learns to produce simple-to-complex sensorimotor behaviors — autonomous development. We present a model that incrementally fuses such an open-ended life-long stream and updates the “brain” online so the perceived world is 3D. Traditional methods for shape- from-X use a particular type of cue X (e.g., stereo disparity, shading, etc.) to compute depths or local shapes based on a handcrafted physical model. Such a model likely results in a brit- tle system because of the fluctuation of the availability of the cue. An embodiment of the Developmental Network (DN), called Stereo Where-What Network (WWN-8), learns to per- form simultaneous attention and recognition, while developing invariances in location, disparity, shape, and surface type, so that multiple cues can automatically fill in if a particular type of cue (e.g., texture) is missing locally from the real world. We report some experiments: 1) dynamic synapse retraction and growth as a method of developing receptive fields. 2) training for recognizing 3D objects directly in cluttered natural backgrounds. 3) integration of depth perception with location and type information. The experiments used stereo images and motor actions on the order of 105 frames. Potential applications include driver assistance for road safety, mobile robots, autonomous navigation, and autonomous vision-guided manipulators.

Full Text