Abstract
Bio-inspired event cameras have been considered effective alternatives to traditional frame-based cameras for stereo depth estimation, especially in challenging conditions such as low-light or high-speed environments. Recently, deep learning-based supervised event stereo matching methods have achieved significant performance improvements over the traditional event stereo methods. However, the supervised methods depend on ground-truth disparity maps for training, and it is difficult to secure a large amount of ground-truth disparity maps. A feasible alternative is to devise an unsupervised event stereo method that can be trained without ground-truth disparity maps. To this end, we propose the first unsupervised event stereo matching method that can predict dense disparity maps, and is trained by transforming the depth estimation problem into a warping-based reconstruction problem. We propose a novel unsupervised loss function that enforces the network to minimize the feature-level epipolar correlation difference between the ground-truth intensity images and warped images. Moreover, we propose a novel event embedding mechanism that utilizes both temporal and spatial neighboring events to capture spatio-temporal relationships among the events for stereo matching. Experimental results reveal that the proposed method outperforms the baseline unsupervised methods by significant margins (e.g., up to 16.88% improvement) and achieves comparable results with the existing supervised methods. Extensive ablation studies validate the efficacy of the proposed modules and architectural choices.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have