SpecTrHuMS: Spectral transformer for human mesh sequence learning

Clément Lemeunier,Florence Denis,Guillaume Lavoué,Florent Dupont

doi:10.1016/j.cag.2023.07.001

Abstract

We present SpecTrHuMS, a Spectral Transformer for 3D triangular Human Mesh Sequence learning which combines known deep learning models with spectral mesh processing to capture characteristics of 3D shapes as well as temporal dependencies between the frames. Unlike previous works in this field, our approach is able to work directly with a compressed representation of the geometry, the spectral coefficients, rather than relying solely on skeleton joints that does not contain surface information. The vertices of each mesh of a sequence are first projected on the eigenvectors of the Graph Laplacian computed from the common triangulation. A convolutional encoder then encodes each frame into lower dimensional latent variables that preserve as much as possible the spectral information. These latent variables are next passed through a transformer architecture so that the model understands the context of the sequence and learns temporal dependencies between the frames. Each frame of the transformer’s output is then decoded by a convolutional decoder which aims to reconstruct the input spectral coefficients. Finally, all frames are transformed back into the spatial domain, resulting in a general process able to treat 4D surfaces with a constant connectivity. Our method is evaluated on a prediction task on AMASS, a dataset of human surface sequences, showing the ability of our model to produce realistic movements while preserving the identity of a subject, and showing that this work is a significant step towards efficient and high-quality representation of triangular mesh sequences with constant connectivity. Additional experiments show that our model can be easily extended to other tasks such as long term prediction, completion and that it is generalizable to other datasets with constant connectivity. This work opens up new possibilities for applications in the fields of animation, virtual reality, and computer graphics. Pretrained models, the code to train them and the code to create datasets will be made publicly available.

Full Text