Abstract

Most existing 3D object detection methods recognize objects individually, without giving any consideration on contextual information between these objects. However, objects in indoor scenes are usually related to each other and the scene, forming the contextual information. Based on this observation, we propose a novel 3D object detection network, which is built on the state-of-the-art VoteNet but takes into consideration of the contextual information at multiple levels for detection and recognition of 3D objects. To encode relationships between elements at different levels, we introduce three contextual sub-modules, capturing contextual information at patch, object, and scene levels respectively, and build them into the voting and classification stages of VoteNet. In addition, at the post-processing stage, we also consider the spatial diversity of detected objects and propose an improved 3D NMS (non-maximum suppression) method, namely Survival-Of-the-Best 3DNMS (SOB-3DNMS), to reduce false detections. Experiments demonstrate that our method is an effective way to promote detection accuracy, and has achieved new state-of-the-art detection performance on challenging 3D object detection datasets, i.e., SUN RGBD and ScanNet, when only taking point cloud data as input.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call