Abstract
In its early stages, the visual system suffers from a lot of ambiguity and noise that severely limits the performance of early vision algorithms. This article presents feedback mechanisms between early visual processes, such as perceptual grouping, stereopsis and depth reconstruction, that allow the system to reduce this ambiguity and improve early representation of visual information. In the first part, the article proposes a local perceptual grouping algorithm that — in addition to commonly used geometric information — makes use of a novel multi–modal measure between local edge/line features. The grouping information is then used to: 1) disambiguate stereopsis by enforcing that stereo matches preserve groups; and 2) correct the reconstruction error due to the image pixel sampling using a linear interpolation over the groups. The integration of mutual feedback between early vision processes is shown to reduce considerably ambiguity and noise without the need for global constraints.
Highlights
Both human and machine perception involve a progressive abstraction of visual information, from the raw signal provided by the eyes or the cameras towards symbolic, object–centric representations [1]
A large amount of work on signal processing and invariant feature descriptors [3] lead to significant progress for tasks like navigation [4] and object recognition [5]
The contributions in this paper are threefold: first we propose a local perceptual grouping mechanism making full use of the multi– modal and semantic information carried by the visual primitives; second, we propose a stereo matching scheme for primitives, allowing for the reconstruction of the 3D equivalent of 2D primitives; third, we investigate how perceptual grouping reduces ambiguities in the reconstructed 3D representation
Summary
Both human and machine perception involve a progressive abstraction of visual information, from the raw signal provided by the eyes or the cameras towards symbolic, object–centric representations [1]. One notable attempt by Nevatia and colleagues [6,7], makes use of a feature hierarchy for stereo reconstruction Another notable class of systems is the model–based vision, where a large amount of world knowledge is available and is used to disambiguate and interpret the visual signal. One problem with the latter approach is that the large amount of ambiguity and noise present in images can lead an early extraction of symbolic features to fail, failures which are difficult to correct. The use of sophisticated models in vision introduces more bias in the system, whereas signal based approaches lead to more variance
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.