We present a drift-free visual compass for estimating the three degrees of freedom (DoF) rotational motion of a camera by recognizing structural regularities in a Manhattan world (MW), which posits that the major structures conform to three orthogonal principal directions. Existing Manhattan frame estimation approaches are based on either data sampling or a parameter search, and fail to guarantee accuracy and efficiency simultaneously. To overcome these limitations, we propose a novel approach to hybridize these two strategies, achieving quasi-global optimality and high efficiency. We first compute the two DoF of the camera orientation by detecting and tracking a vertical dominant direction from a depth camera or an IMU, and then search for the optimal third DoF with the image lines through the proposed Manhattan Mine-and-Stab (MnS) approach. Once we find the initial rotation estimate of the camera, we refine the absolute camera orientation by minimizing the average orthogonal distance from the endpoints of the lines to the MW axes. We compare the proposed algorithm with other state-of-the-art approaches on a variety of real-world datasets including data from a drone flying in an urban environment, and demonstrate that the proposed method outperforms them in terms of accuracy, efficiency, and stability. The code is available on the project page: <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/PyojinKim/MWMS</uri>
Read full abstract