Abstract

Event-based sensing, i.e., the asynchronous detection of luminance changes, promises low-energy, high dynamic range, and sparse sensing. This stands in contrast to whole image frame-wise acquisition by standard cameras. Here, we systematically investigate the implications of event-based sensing in the context of visual motion, or flow, estimation. Starting from a common theoretical foundation, we discuss different principal approaches for optical flow detection ranging from gradient-based methods over plane-fitting to filter based methods and identify strengths and weaknesses of each class. Gradient-based methods for local motion integration are shown to suffer from the sparse encoding in address-event representations (AER). Approaches exploiting the local plane like structure of the event cloud, on the other hand, are shown to be well suited. Within this class, filter based approaches are shown to define a proper detection scheme which can also deal with the problem of representing multiple motions at a single location (motion transparency). A novel biologically inspired efficient motion detector is proposed, analyzed and experimentally validated. Furthermore, a stage of surround normalization is incorporated. Together with the filtering this defines a canonical circuit for motion feature detection. The theoretical analysis shows that such an integrated circuit reduces motion ambiguity in addition to decorrelating the representation of motion related activations.

Highlights

  • The initial stages of visual processing extract a vocabulary of relevant feature items related to a visual scene

  • The proposed mechanism has been motivated from the perspective of sampling the plenoptic function such that specific temporal changes in the optic array are registered by the sensory device

  • Compared to plane-fitting models we have shown that our model has the advantage that it can encode multiple motion directions at a single location, such as, e.g., transparent motion (Figure 10; compare, e.g., Snowden et al, 1991; Treue et al, 2000; see e.g., Raudies and Neumann, 2010; Raudies et al, 2011 for a detailed discussion of motion transparency computation)

Read more

Summary

Introduction

The initial stages of visual processing extract a vocabulary of relevant feature items related to a visual scene. Rays of light reach the observer’s eye and are transformed to internal representations. This can be formalized as sampling the ambient optic array (Gibson, 1978, 1986). The plenoptic function P(θ, φ, λ, t, Vx, Vy, Vz) describes the intensity of a light ray of wavelength λ passing through the center of the pupil of an idealized eye at every possible angle (θ, φ) located at the position (Vx, Vy, Vz) at time t (Adelson and Bergen, 1991). Conventional frame-based cameras sample the optic array by reading out measurements of all light-sensitive pixels at a fixed rate. Since the temporal sampling rate is limited through reading all pixel values in a fixed time interval, fast local luminance changes are integrated over time and cannot be differentiated in the further processing. Address-event representations (AER), on the other hand, originate from image sensors in which pixel operate at individual rates generating events based on local decisions to generate an output response, like in the mammalian retina (Mead, 1990; Liu and Delbruck, 2010)

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.