Abstract
Following the significant success of the transformer in NLP and computer vision, this paper attempts to extend it to 3D triangle mesh. The aim is to determine the shape’s global representation using the transformer and capture the inherent manifold information. To this end, this paper proposes a novel learning framework named Navigation Geodesic Distance Transformer (NGD-Transformer) for 3D mesh. Specifically, this approach combined farthest point sampling with the Voronoi segmentation algorithm to spawn uniform and non-overlapping manifold patches. However, the vertex number of these patches was inconsistent. Therefore, self-attention graph pooling is employed for sorting the vertices on each patch and screening out the most representative nodes, which were then reorganized according to their scores to generate tokens and their raw feature embeddings. To better exploit the manifold properties of the mesh, this paper further proposed a novel positional encoding called navigation geodesic distance positional encoding (NGD-PE), which encodes the geodesic distance between vertices relatively and spatial symmetrically. Subsequently, the raw feature embeddings and positional encodings were summed as input embeddings fed to the graph transformer encoder to determine the global representation of the shape. Experiments on several datasets were conducted, and the experimental results show the excellent performance of our proposed method.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have