Abstract

Identifying the orientation and location of a camera placed arbitrarily in a room is a challenging problem. Existing approaches impose common assumptions (e.g. the ground plane is the largest plane in the scene, the camera roll angle is zero). We present a method for estimating the ground plane and camera orientation in an unknown indoor environment given RGB-D data (colour and depth) from a camera with arbitrary orientation and location assuming that at least one person can be seem smoothly moving within the camera field of view with their body perpendicular to the ground plane. From a set of RGB-D data trials captured using a Kinect sensor, we develop an approach to identify potential ground planes, cluster objects in the scenes and find 2D Scale-Invariant Feature Transform (SIFT) keypoints for those objects, and then build a motion sequence for each object by evaluating the intersection of each object's histogram in three dimensions across frames. After finding the reliable homography for all objects, we identify the moving human object by checking the change in the histogram intersection, object dimensions and the trajectory vector of the homgraphy decomposition. We then estimate the ground plane from the potential planes using the normal vector of the homography decomposition, the trajectory vector, and the spatial relationship of the planes to the other objects in the scene. Our results show that the ground plane can be successfully detected, if visible, regardless of camera orientation, ground plane size, and movement speed of the human. We evaluated our approach on our own data and on three public datasets, robustly estimating the ground plane in all indoor scenarios. Our successful approach substantially reduces restrictions on a prior knowledge of the ground plane, and has broad application in conditions where environments are dynamic and cluttered, as well as fields such as automated robotics, localization and mapping.

Highlights

  • With one additional dimension, 3D data provide a more intuitive and realistic environmental perspective in computer vision applications than traditional 2D data

  • Existing ground plane detection approaches require that significant assumptions are met

  • Our approach robustly finds the indoor ground plane with unrestrictive assumptions: the sensors is an RGB-D camera; at least one person smoothly walks in the scene with most parts of the body visible within the camera field of view; and the human body is perpendicular to the ground plane while walking

Read more

Summary

Introduction

3D data provide a more intuitive and realistic environmental perspective in computer vision applications than traditional 2D data. By combining traditional 2D RGB data with depth information, 3D data create a more comprehensive digital representation of real world environments, providing considerable value in many applications such as training and simulation [1]–[3], construction [4]–[6] and gaming [7]–[10]. The benefits of 3D data over 2D data are noticeable in cluttered or dynamic environments.

Objectives
Methods
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call