Abstract
Single-image depth estimation represents a longstanding challenge in computer vision and although it is an ill-posed problem, deep learning enabled astonishing results leveraging both supervised and self-supervised training paradigms. State-of-the-art solutions achieve remarkably accurate depth estimation from a single image deploying huge deep architectures, requiring powerful dedicated hardware to run in a reasonable amount of time. This overly demanding complexity makes them unsuited for a broad category of applications requiring devices with constrained resources or memory consumption. To tackle this issue, in this paper a family of compact, yet effective CNNs for monocular depth estimation is proposed, by leveraging self-supervision from a binocular stereo rig. Our lightweight architectures, namely PyD-Net and PyD-Net2, compared to complex state-of-the-art trade a small drop in accuracy to drastically reduce the runtime and memory requirements by a factor ranging from <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$2\times $ </tex-math></inline-formula> to <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$100\times $ </tex-math></inline-formula> . Moreover, our networks can run real-time monocular depth estimation on a broad set of embedded or consumer devices, even not equipped with a GPU, by early stopping the inference with negligible (or no) loss in accuracy, making it ideally suited for real applications with strict constraints on hardware resources or power consumption.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Intelligent Transportation Systems
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.