Selection of Key Frames for 3D Reconstruction in Real Time

Alan Koschel,Christoph Müller,Alexander Reiterer

doi:10.3390/a14110303

Abstract

Cameras play a prominent role in the context of 3D data, as they can be designed to be very cheap and small and can therefore be used in many 3D reconstruction systems. Typical cameras capture video at 20 to 60 frames per second, resulting in a high number of frames to select from for 3D reconstruction. Many frames are unsuited for reconstruction as they suffer from motion blur or show too little variation compared to other frames. The camera used within this work has built-in inertial sensors. What if one could use the built-in inertial sensors to select a set of key frames well-suited for 3D reconstruction, free from motion blur and redundancy, in real time? A random forest classifier (RF) is trained by inertial data to determine frames without motion blur and to reduce redundancy. Frames are analyzed by the fast Fourier transformation and Lucas–Kanade method to detect motion blur and moving features in frames to label those correctly to train the RF. We achieve a classifier that omits successfully redundant frames and preserves frames with the required quality but exhibits an unsatisfied performance with respect to ideal frames. A 3D reconstruction by Meshroom shows a better result with selected key frames by the classifier. By extracting frames from video, one can comfortably scan objects and scenes without taking single pictures. Our proposed method automatically extracts the best frames in real time without using complex image-processing algorithms.

Highlights

This paper aims to classify frames by a random forest classifier (RF) to identify suitable frames due to inertial data in real time
We applied the 2D discrete Fourier transformation (2D-DFT) to each frame in the calibration recording while only taking heterogeneous boxes into account
We achieved a classifier that successfully omitted redundant frames and preserved frames with the required quality. It exhibited an unsatisfying performance with respect to ideal frames

Summary

Introduction

A detailed description of the environment in the form of images or 3D data is of major importance for a whole range of applications. Such data are used for planning and developing our transport infrastructure [1], as well as geomonitoring [2] or medical practice [3]. The data are collected by a wide variety of sensors. In most cases, these are light detection and ranging (LiDAR), cameras, or radar. Cameras play a prominent role in this context, as they can be designed to be very cheap and small and can be used in almost any detection system—either as a stand-alone acquisition system or in combination with the previously mentioned sensors

Objectives

Methods

Results

Conclusion