Abstract

Depth estimation in real-world applications requires precise responses to fast motion and challenging lighting conditions. Event cameras use bio-inspired event-driven sensors that provide instantaneous and asynchronous information of pixel-level log intensity changes, which makes them suitable for depth estimation in such challenging conditions. However, as the event cameras primarily provide asynchronous and spatially sparse event data, it is hard to provide accurate dense disparity map in stereo event camera setups - especially in estimating disparities on local structures or edges. In this study, we develop a novel deep event stereo network that reconstructs spatial intensity image features from embedded event streams and leverages the event features using the reconstructed image features to compute dense disparity maps. To this end, we propose a novel event-to-image translation network with a cross-semantic attention mechanism that calculates the global semantic context of the event features for the intensity image reconstruction. In addition, a feature aggregation module is developed for accurate disparity estimation, which modulates the event features with the reconstructed image features by a stacked dilated spatially-adaptive denormalization mechanism. Experimental results reveal that our method can outperform the state-of-the-art methods by significant margins both in quantitative and qualitative measures.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.