Abstract

Camera tracking and the construction of a robust and accurate map in unknown environments are still challenging tasks in computer vision and robotic applications. Visual Simultaneous Localization and Mapping (SLAM) along with Augmented Reality (AR) are two important applications, and their performance is entirely dependent on the accuracy of the camera tracking routine. This paper presents a novel feature-based approach for the monocular SLAM problem using a hand-held camera in room-sized workspaces with a maximum scene depth of 4–5 m. In the core of the proposed method, there is a Particle Filter (PF) responsible for the estimation of extrinsic parameters of the camera. In addition, contrary to key-frame based methods, the proposed system tracks the camera frame by frame and constructs a robust and accurate map incrementally. Moreover, the proposed algorithm initially constructs a metric sparse map. To this end, a chessboard pattern with a known cell size has been placed in front of the camera for a few frames. This enables the algorithm to accurately compute the pose of the camera and therefore, the depth of the primary detected natural feature points are easily calculated. Afterwards, camera pose estimation for each new incoming frame is carried out in a framework that is merely working with a set of visible natural landmarks. Moreover, to recover the depth of the newly detected landmarks, a delayed approach based on linear triangulation is used. The proposed method is applied to a realworld VGA quality video (640 × 480 pixels) where the translation error of the camera pose is less than 2 cm on average and the orientation error is less than 3 degrees, which indicates the effectiveness and accuracy of the developed algorithm.

Highlights

  • The purpose of vision-based camera tracking is to estimate the camera pose from a sequence of input images often in the form of video frames

  • Vision-based camera tracking has a close relation with some fundamental problems in computer vision, e.g., 3D reconstruction, image registration, and Augmented Reality (AR)

  • A planar chessboard pattern is placed on the desk that is used for calculation of the ground truth camera pose

Read more

Summary

Introduction

The purpose of vision-based camera tracking is to estimate the camera pose from a sequence of input images often in the form of video frames. Vision-based camera tracking has a close relation with some fundamental problems in computer vision, e.g., 3D reconstruction, image registration, and AR. Visual SLAM aims to estimate the camera trajectory and at the same time, to construct a sparse or dense representation of the scene. Visual SLAM solutions incrementally construct a map of the observed scene and use this map to locate the camera position. Unlike cameras equipped with active sensors, a monocular camera has a passive and bearing-only sensor that can only produce 2D measurements of the 3D observed scene

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call