Abstract

Time-of-Flight (ToF) sensors and stereo vision systems are both widely used for capturing depth data. They have some complementary strengths and limitations, which have been exploited in prior research to produce more accurate depth maps by fusing data from the two sources. However, among these diverse data fusion approaches, none of them provides an end-to-end neural network solution. In this work, we propose the first end-to-end ToF and stereo data fusion network using the coarse-to-fine matching framework, where the prior of ToF depth is integrated into the stereo matching process by constraining the search range of stereo matching within an interval around the ToF camera depth measurement. We adopt a dynamic search range for each pixel according to an estimated ToF error map, which is more efficient and effective than a constant one when handling various errors. The ToF error map is estimated by the ToF error estimator branching out from the stereo matching network. Both ToF error estimation and stereo matching are performed in a joint framework, with the two tasks assisting each other mutually. We also propose an upsampling module to replace the naive bilinear upsampling in the coarse-to-fine stereo matching network, which reduces the error caused by the upsampling. The proposed deep network is trained end-to-end on synthetic datasets and generalizable to real-world datasets without further fine-tuning. Experimental results show that our fusion method achieves higher accuracy than either ToF or stereo alone, and outperforms state-of-the-art fusion methods on both synthetic and real data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call