Abstract
Structure from Motion (SfM) is a fundamental computer vision problem which has not been well handled by deep learning. One of the promising solutions is to apply explicit structural constraint, e.g., 3D cost volume, into the neural network. Obtaining accurate camera poses from images alone can be challenging, especially with complicated environmental factors. Existing methods usually assume accurate camera poses from GT or other methods, which is unrealistic in practice and additional sensors are needed. In this work, we design a physical driven architecture, namely DeepSFM, inspired by traditional Bundle Adjustment, which consists of two cost volume based architectures to iteratively refine depth and pose. The explicit constraints on both depth and pose, when combined with the learning components, bring merit from both traditional BA and emerging deep learning technology. To speed up the learning and inference efficiency, we apply the Gated Recurrent Units (GRUs)-based depth and pose update modules with coarse to fine cost volumes on the iterative refinements. In addition, with the extended residual depth prediction module, our model can be adapted to dynamic scenes effectively. Extensive experiments on various datasets show that our model achieves state-of-the-art performance with superior robustness against challenging inputs.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Pattern Analysis and Machine Intelligence
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.