Abstract

Monocular depth estimation is an important computer vision task widely explored in fields such as autonomous driving, robotics, and more. Recently, deep optics approaches that optimize diffractive optical elements (DOEs) with differentiable frameworks have improved depth estimation performance. However, they only consider on-axis point spread functions (PSFs) and are highly dependent on system calibration. We propose a precise end-to-end paradigm combining ray tracing and angular spectrum diffraction. With this approach, we jointly trained a DOE and a reconstruction network for depth estimation and image restoration. Compared with conventional deep optics approaches, we accurately simulate both on-axis and off-axis PSFs, eliminating the need for calibration. We have validated the high similarity between captured PSFs and simulated ones at 19.4∘ field-of-view (FOV). Our optimized phase mask and network achieve state-of-the-art performance among semantic-based monocular depth estimation and existing deep optics methods. During real-world experiments, our prototype camera shows depth distributions similar to the Intel D455 infrared structured light camera.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call