KVGCN: A KNN Searching and VLAD Combined Graph Convolutional Network for Point Cloud Segmentation

Nan Luo,Quan Wang,Hongquan Yu,Ying Xu,Jinhui Liu,Zhenfeng Huo,Yun Gao

doi:10.3390/rs13051003

Abstract

Semantic segmentation of the sensed point cloud data plays a significant role in scene understanding and reconstruction, robot navigation, etc. This work presents a Graph Convolutional Network integrating K-Nearest Neighbor searching (KNN) and Vector of Locally Aggregated Descriptors (VLAD). KNN searching is utilized to construct the topological graph of each point and its neighbors. Then, we perform convolution on the edges of constructed graph to extract representative local features by multiple Multilayer Perceptions (MLPs). Afterwards, a trainable VLAD layer, NetVLAD, is embedded in the feature encoder to aggregate the local and global contextual features. The designed feature encoder is repeated for multiple times, and the extracted features are concatenated in a jump-connection style to strengthen the distinctiveness of features and thereby improve the segmentation. Experimental results on two datasets show that the proposed work settles the shortcoming of insufficient local feature extraction and promotes the accuracy (mIoU 60.9% and oAcc 87.4% for S3DIS) of semantic segmentation comparing to existing models.

Highlights

As one of the key technologies for scene understanding, semantic segmentation [1,2,3]of 3D point clouds plays fundamental roles in the fields of 3D reconstruction, autonomous driving, and robotics
Experimental results on two datasets show that KVGCN achieves comparable or superior performance compared with state-of-the-art methods
This paper proposes a convolution network based on a K-Nearest Neighbor searching (KNN) topological graph to encode the local features

Summary

Introduction

As one of the key technologies for scene understanding, semantic segmentation [1,2,3]. Of 3D point clouds plays fundamental roles in the fields of 3D reconstruction, autonomous driving, and robotics. Vehicles in autonomous driving applications need to interpret the objects (e.g., pedestrians and cars) and their kinestates in the outdoor scenes before making reliable decisions. For a robot, reconstructing and parsing the models of surrounding environments is the premise of navigation and object manipulation. Unlike 2D images, point clouds are unstructured, unevenly distributed, and large in data volume, making them difficult to process analyze by the conventional methods. Great attention has been paid to achieving reliable semantic segmentation of point clouds in deep learning style. How to effectively learn presentative features from unorganized point clouds is still a challenging problem

Methods

Results

Discussion

Conclusion