Abstract

The 3D comprehension ability of indoor environments is critical for robots. While deep learning-based methods have improved performance, they require significant amounts of annotated training data. Nevertheless, the cost of scanning and annotating point cloud data in real scenes is high, leading to data scarcity. Consequently, there is an urgent need to investigate data-efficient methods for point cloud instance segmentation. To tackle this issue, we propose to leverage the geometric and scene context knowledge inherent in synthetic data to reduce the need for annotation on real data. Specifically, we simulate the process of human scanning and collecting point cloud data in real-world scenes and construct three large-scale synthetic point cloud datasets using synthetic scenes. The scale of these three datasets is more than ten times that of currently available real-world data. Experimental results demonstrate that the incorporation of synthetic point cloud data can increase instance segmentation performance by over 18.8 percentage points. Further, to address the problem of domain shift between synthetic and real data, we propose a target-aware pre-training method. It integrates both real and synthetic data during the pre-training process, allowing the model to learn a feature representation that can effectively generalize to downstream real data. Experiments show that our method achieved stable improvements on all three synthetic datasets. The data and code will be publicly available in the future.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call