MWFormer: Mesh Understanding with Window-based Transformer

Hao-Yang Peng,Meng-Hao Guo,Zheng-Ning Liu,Yong-Liang Yang,Tai-Jiang Mu

doi:10.1016/j.cag.2023.07.028

Abstract

Polygonal mesh has been proven to be a powerful representation of 3D shapes, given its efficiency in expressing shape surface while maintaining geometric and topological information. Increasing efforts have been made to design elaborate deep convolutional neural networks for meshes. However, these methods naturally ignore the global connectivity among mesh primitives due to the locality nature of convolution operations. In this paper, we introduce a transformer-like self-attention mechanism with down-sampling architectures for mesh learning to capture both the global and local relationships among mesh faces. To achieve this, we propose BFS-Pooling, which can convert a connected mesh into discrete tokens (i.e., a set of adjacent faces) with breath-first-search (BFS) and naturally build hierarchical architectures for mesh learning by pooling mesh tokens. Benefiting from BFS-Pooling, we design a hierarchical transformer architecture with a window-based local attention mechanism, Mesh Window Transformer (MWFormer). Experimental results demonstrate that MWFormer achieves the best or competitive performance in both mesh classification and mesh segmentation tasks. Code will be available.

Full Text