Abstract Only using the point clouds collected by a Kinect sensor to perform 3D object detection is a very challenging task for indoor sense understanding. In this paper, we propose a novel point clouds based 3D object detection framework named Weighted VoteNet (W-VoteNet), which systematically enhances seed points, voting points, and proposal objects to varying degree, thereby enhancing the detection accuracy of indoor scene point clouds. Firstly, a new neighborhood feature enhancement module was designed to enhance the features of the seed point, which can better utilize the neighborhood feature of the seed point. Secondly, a weighted voting module is implemented to increase the accuracy of voting, allowing more prospect seed points to participate in voting. Finally, a new semantic relation reasoning module is proposed to get the semantic relationship features of the proposal object, which can further decrease false alarm rate. The three modules we proposed all contribute to more accurate voting and more effective proposals. The experimental results on two benchmark indoor 3D object detection datasets, SUN RGB-D and ScanNet V2, demonstrate the effectiveness of our approach. The source code is available at: https://github.com/Guo-JQ/W-VoteNet.
Read full abstract