Abstract
Inspired by the successful implementation of transformer network in the Natural Language Processing (NLP), we propose a novel Bidirectional Encoder Representations from Transformers (BERT)-based point cloud segmentation method. Specifically, the whole point cloud is scanned by multiple overlapping windows. We made the first attempt ever to input each window-point-cloud into the BERT model which outputs points' semantic labels and high-dimensional context-aware point embeddings. In the process of training, the Kullback-Leibler (KL)-Divergence-based clustering loss is utilized to optimize the network's parameters by calculating similarity matrices between the point embeddings and the predicted semantic labels. The final instance labels can be obtained by softmax function on these optimized point embeddings. By evaluating on the Stanford 3D Indoor Scene (S3DIS) dataset, our proposed method has reached a micro-mean accuracy (mAcc) of 87.3% on the semantic segmentation task and an Average Precision (mAP) on the instance segmentation task. The results on both tasks have surpassed the traditional point cloud segmentation models.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.