Abstract
An essential task for 3D visual world understanding is 3D object detection in lidar point clouds. To predict directly bounding box parameters from point clouds, existing voting-based methods use Hough voting to obtain the centroid of each object. However, it may be difficult for the inaccurately voted centers to regress boxes accurately, leading to the generation of redundant bounding boxes. For objects in indoor scenes, there are several co-occurrence patterns for objects in indoor scenes. Concurrently, semantic relations between object layouts and scenes can be used as prior context to guide object detection. We propose a simple, yet effective network, RSFF-Net, which adds refined voting and scene feature fusion for indoor 3D object detection. The RSFF-Net consists of three modules: geometric function, refined voting, and scene constraint. First, a geometric function module is used to capture the geometric features of the nearest object of the voted points. Then, the coarse votes are revoted by a refined voting module, which is based on the fused feature between the coarse votes and geometric features. Finally, a scene constraint module is used to add the association information between candidate objects and scenes. RSFF-Net achieves competitive results on indoor 3D object detection benchmarks: ScanNet V2 and SUN RGB-D.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.