CuFusion: Accurate Real-Time Camera Tracking and Volumetric Scene Reconstruction with a Cuboid.

Chen Zhang,Yu Hu

doi:10.3390/s17102260

Abstract

Given a stream of depth images with a known cuboid reference object present in the scene, we propose a novel approach for accurate camera tracking and volumetric surface reconstruction in real-time. Our contribution in this paper is threefold: (a) utilizing a priori knowledge of the precisely manufactured cuboid reference object, we keep drift-free camera tracking without explicit global optimization; (b) we improve the fineness of the volumetric surface representation by proposing a prediction-corrected data fusion strategy rather than a simple moving average, which enables accurate reconstruction of high-frequency details such as the sharp edges of objects and geometries of high curvature; (c) we introduce a benchmark dataset CU3D that contains both synthetic and real-world scanning sequences with ground-truth camera trajectories and surface models for the quantitative evaluation of 3D reconstruction algorithms. We test our algorithm on our dataset and demonstrate its accuracy compared with other state-of-the-art algorithms. We release both our dataset and code as open-source (https://github.com/zhangxaochen/CuFusion) for other researchers to reproduce and verify our results.

Highlights

Real-time camera tracking and simultaneous dense scene reconstruction has been one of the most actively studied problems in computer vision over recent years
(
We have presented a novel approach called CuFusion for real-time 3D scanning and accurate surface reconstruction using a Kinect-style depth camera

Summary

Introduction

Real-time camera tracking and simultaneous dense scene reconstruction has been one of the most actively studied problems in computer vision over recent years. The advent of depth cameras based either on structured light (e.g., Asus Xtion, Kinect 1.0) or time-of-flight (ToF) (e.g., Kinect 2.0) sensing offers dense depth measurements directly in real-time as video streams. Such dense depth sensing technologies have drastically simplified the process of dense 3D modeling, which turns the widely available Kinect-style depth cameras into consumer-grade 3D scanners. An iterative closest point (ICP) algorithm [2] is performed to align the current depth map to the reconstructed volumetric truncated signed distance function (TSDF) [3] surface model to get the camera pose estimation. A triangulated 3D mesh model could be extracted using a Marching Cubes type algorithm [4]

Methods

Results

Conclusion