Model-based gait recognition methods usually adopt the pedestrian walking postures to identify human beings. However, existing methods did not explicitly resolve the large intra-class variance of human pose due to changes in camera view. In this paper, we propose a lower-upper generative adversarial network (LUGAN) to generate multi-view pose sequences for each single-view sample to reduce the cross-view variance. Based on the prior of camera imaging, we prove that the spatial coordinates between cross-view poses satisfy a linear transformation of a full-rank matrix. Hence, LUGAN employs the adversarial training to learn full-rank transformation matrices from the source pose and target views to obtain the target pose sequences. The generator of LUGAN is composed of graph convolutional (GCN) layers, fully connected (FC) layers and two-branch convolutional (CNN) layers: GCN layers and FC layers encode the source pose sequence and target view, then CNN layers take as input the encoded features to learn a lower triangular matrix and an upper one, finally the transformation matrix is formulated by multiplying the lower and upper triangular matrices. For the purpose of adversarial training, we develop a conditional discriminator that distinguishes whether the pose sequence is true or generated. Furthermore, to facilitate the high-level correlation learning, we propose a plug-and-play module, named multi-scale hypergraph convolution (HGC), to replace the spatial graph convolutional layer in baseline, which can simultaneously model the joint-level, part-level and body-level correlations. Extensive experiments on three large gait recognition datasets (i.e., CASIA-B, OUMVLP-Pose and NLPR) demonstrate that our method outperforms the baseline model by a large margin.
Read full abstract