Abstract
Visual-based unsupervised learning <xref ref-type="bibr" rid="ref1" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">[1]</xref> – <xref ref-type="bibr" rid="ref2" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"/> <xref ref-type="bibr" rid="ref3" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">[3]</xref> has emerged as a promising approach in estimating monocular depth and ego-motion, avoiding intensive efforts on collecting and labeling the ground truth. However, they are still restrained by the brightness constancy assumption among video sequences, especially susceptible with frequent illumination variations or nearby textureless surroundings in indoor environments. In this article, we selectively combine the complementary strength of visual and inertial measurements, i.e., videos extract static and distinct features while inertial readings depict scale-consistent and environment-agnostic movements, and propose a novel unsupervised learning framework to predict both monocular depth and ego-motion trajectory simultaneously. This challenging task is solved by learning both forward and backward inertial sequences to eliminate inevitable noises, and reweighting visual and inertial features via gated neural networks in various environments or with user-specific moving dynamics. In addition, we also employ structure cues to produce scene depths from a single image and explore structure consistency constraints to calibrate the depth estimates in indoor buildings. Experiments on the outdoor KITTI data set and our dedicated indoor prototype reveal that our approach consistently outperforms the state of the art on both depth and ego-motion estimates. To the best of our knowledge, this is the first work to fuse visual and inertial data without any supervision signals for monocular depth and ego-motion estimation, and our solution remain effective and robust even in textureless indoor scenarios.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.