Efficient 3D Object Detection of Indoor Scenes Based on RGB-D Video Stream

Yongwei Miao,Xinjie Zhang,Jiahui Chen,Wenjuan Ma,Shusen Sun

doi:10.3724/sp.j.1089.2021.18630

Abstract

<p indent=0mm>For indoor object detection, the input complex scenes often have some defects such as incomplete RGB-D scanning data or mutual occlusion of its objects. Meanwhile, due to the limitations of single RGB-D data or point cloud data input of indoor scenes, it is always difficult to detect all of 3D objects simultaneously. In order to overcome this issue and also alleviate its low efficiency for indoor object detection, an efficient 3D object detection method is proposed which takes RGB-D video streams as input. First, the RGB-D video stream of different indoor environments can be obtained using Kinect camera, and also captured its continuous RGB frames and corresponding point cloud data. Secondly, the Hash function is adopted to extract the content-sensitive key frames from the continuous RGB frames, and the objects semantic relationship can also be constructed according to the type/number of 3D objects contained in adjacent key frames for ensuring that different objects will appear in each key frame. Then, 3D objects of the extracted key frames can be detected by using VoteNet, and the detection results of other frames can be estimated owing to relative posture relationship between adjacent frames by using the quaternion spherical linear interpolation algorithm. Finally, it can achieve efficient 3D object detection for each frame in the RGB-D video stream. Using SUN RGB-D dataset to train the object detection network of key frame, the detection result of proposed method is accurate, and the overall detection time is greatly reduced if comparing with the VoteNet based frame-by-frame detection scheme. Experimental results demonstrate that proposed method is effective and efficient.

Full Text