Abstract
We propose a method for learning a scene coordinate regression model to perform accurate camera relocalization in a known environment from a single RGB image. Our method incorporates self-supervision for scene coordinates via multi-view geometric constraints to improve training. More specifically, we use an image-based warp error between different views of a scene point to improve the ability of the network to regress to the correct absolute scene coordinates of the point. For the warp error we explore both RGB values, and deep learned features, as the basis for the error. We provide a thorough analysis of the effect of each component in our framework and evaluate our method on both indoor and outdoor datasets. We show that compared to the coordinate regression model trained with single-view information, this multi-view constraint benefits the learning process and the final performance. It not only helps the networks converge faster compared to the model trained with single-view reprojection loss, but also improves the accuracy of the absolute pose estimation using a single RGB image compared to the prior art.
Paper version not known (Free)
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have