Pose estimation is an important component of many real-world computer vision systems. Most existing pose estimation algorithms need a large number of point correspondences to accurately determine the pose of an object. Since the number of point correspondences depends on the object’s appearance, lighting and other external conditions, detecting many points may not be feasible. In many real-world applications, the movement of objects is limited, e.g. due to gravity. Hence, detecting objects with only three degrees of freedom is usually sufficient. This allows us to improve the accuracy of pose estimation by changing the underlying equations of the perspective-n-point problem to three variables instead of six. By using the simplified equations, our algorithm is more robust against detection errors with limited point correspondences. In this article, we study three scenarios where such constraints apply. The first one is about parking a vehicle on a specific spot. Here, a stationary camera is detecting the vehicle to assist the driver. The second scenario describes the perspective of a moving camera detecting objects in its environment. This scenario is common for driver assistance systems, autonomous cars or mobile robots. Third, we describe a camera observing objects from a birds-eye view, which occurs in industrial applications. In all three scenarios, observed objects can only move in the ground plane and rotate around the vertical axis. Hence, three degrees of freedom are sufficient to estimate the pose. Experiments with synthetic data and real-world photographs have shown that our algorithm outperforms state-of-the-art pose estimation algorithms. Depending on the scenario, our algorithm is able to achieve 50% better accuracy, while being equally fast.
Read full abstract