Abstract

Autonomous robots for medical and emergency supplies are a potential way to avoid contact with people in quarantine and control the spread of contagious diseases in an indoor scene. However, scene understanding and reconstruction through a single low-cost camera remains a challenge. It is known that absolute precise depth cannot be calculated accurately from a single image, but the relative pose of different planes, which can be inferred from geometric features in a 2-D image, are more likely to be used in understanding scenes and its reconstruction. In this article, we present an interpretable model to bridge the gap between 2-D scene understanding and three-dimensional (3-D) reconstruction without prior training or any precise depth data. Based on 2-D semantic information in our previous works, the 3-D relative pose of estimated planes can be estimated. At that point, indoor scenes are approximated in the reconstruction. The approach behaves as an interpretable characteristic and requires no prior training or knowledge of the camera's internal parameters. We compare the quantitative performance on the percentage of incorrectly reconstructed planes by relative pose estimation. The results demonstrated that the method can successfully understand and reconstruct indoor scenes including both Manhattan and curved non-Manhattan structures.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call