Abstract

Object detection is a fundamental function of many intelligent autonomous systems such as service robots or autonomous vehicles. State-of-the-art 3D object detection methods are deep learning methods which require large amounts of annotated data for successful training. Acquiring and labeling real-world data is an expensive and time-consuming task. An alternative is to train object detectors using synthetically generated images. Since modern object detectors trained on synthetic data still do not achieve similar results to those trained on real data, generating realistic synthetic data is still an open issue. We address the problem of object detection in indoor environments based on 3D point clouds acquired by 3D cameras. We provide a detailed insight into the importance of five factors involved in generating synthetic data: camera noise, presence of background objects, positioning of objects, scene context, and object sizes. To investigate the importance of these factors, we have developed a fully modular method for generating realistic synthetic single-view point clouds for training object detectors. Our method can generate large amounts of customizable data in a short time. An interesting finding is that pre-training with our data and fine-tuning with real data improves the performance of 3D object detectors, enabling one of them to achieve state-of-the-art results on one of the benchmarks without the use of RGB data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call