Abstract

Visual localization is the task of accurate camera pose estimation within a scene and is a crucial technique for computer vision and robotics. Among the various approaches, relative pose estimation has gained increasing interest because it can generalize to new scenes. This approach learns to regress relative pose between image pairs. However, unreliable regions that contain objects such as the sky, persons, or moving cars are often present in real images, causing noise and interference to localization. In this paper, we propose a novel relative pose estimation pipeline to address the problem. The pipeline features a semantic masking module and an attention module. The two modules help suppress interfering information from unreliable regions, while at the same time emphasizing important features with an attention mechanism. Experiment results show that our framework outperforms alternative methods in the accuracy of camera pose prediction in all scenes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call