Integrating Contextual Information and Attention Mechanisms with Sparse Convolution for the Extraction of Internal Objects within Buildings from Three-Dimensional Point Clouds

Mingyang Yu,Zhongxu Li,Qiuxiao Xu,Xin Chen,Weikang Cui,Fei Su,Qingrui Ji

doi:10.3390/buildings14030636

Abstract

Deep learning-based point cloud semantic segmentation has gained popularity over time, with sparse convolution being the most prominent example. Although sparse convolution is more efficient than regular convolution, it comes with the drawback of sacrificing global context information. To solve this problem, this paper proposes the OcspareNet network, which uses sparse convolution as the backbone and captures global contextual information using the offset attention module and context aggregation module. The offset attention module improves the network’s capacity to obtain global contextual information about the point cloud. The context aggregation module utilizes contextual information in the training and testing phases, which increases the network’s capacity to discern the overall structure and successfully improves the network’s capacity and the accuracy of the difficult-scene segmentation category. Compared to the state-of-the-art (SOTA) models, our model has a smaller parameter count and achieves higher accuracy on challenging segmentation categories such as ‘pictures’, ‘counters’, and ‘desks’ in the ScanNetV2 dataset, with IoU scores of 41.1%, 70.3%, and 72.5%, respectively. Furthermore, ablation experiments confirmed the efficacy of our designed modules.

Full Text