Localization of RGB-D Camera Networks by Skeleton-Based Viewpoint Invariance Transformation

Yun Han,Jeng-Sheng Yeh,Sheng-Luen Chung,Qi-Jun Chen

doi:10.1109/smc.2013.263

Abstract

Combining depth information and color image, RGB-D cameras provide ready detection of humans and the associated 3D skeleton joints data, facilitating if not revolutionizing conventional image centric researches in, among others, computer vision, surveillance, and human activity analysis. Applicability of a D-RBG camera, however, is restricted by its limited range of frustum of depth in the range of 0.8 to 4 meters. Although a RGB-D camera network, constructed by deployment of several RGB-D cameras at various locations, could extend the range of coverage, it requires precise localization of the camera network: relative location and orientation of neighboring cameras. By introducing a skeleton-based viewpoint invariant transformation (SVIT) that derives relative location and orientation of a detected human's upper torso to a RGB-D camera, this paper presents a reliable automatic localization technique without the need for additional instrument or human intervention. By respectively applying SVIT to two neighboring RGB-D cameras on a commonly observed skeleton, the respective relative position and orientation of the detected human's skeleton to these two respective cameras can be obtained before being combined to yield the relative position and orientation of these two cameras, thus solving the localization problem. Experiments have been conducted where two Kinects are situated with bearing differences of about 45 degrees and 90 degrees when coverage extended by up to 70% with the installment of an additional Kinect. The same localization technique can be applied repeatedly to a larger number of RGB-D cameras, thus extending the applicability of RGB-D cameras to camera networks in making human behavior analysis and context-aware service in a lager surveillance area.

Full Text