Prior-Guided Multi-scale Fusion Transformer for Face Attribute Recognition

Shaoheng Song,Jiaxiang Wang,Ran He,Aihua Zheng,Huaibo Huang

doi:10.1007/978-3-031-18907-4_50

Abstract

AbstractMulti-label face attribute recognition (FAR) refers to the task of predicting a set of attribute labels for a facial image. However, existing FAR methods do not work well for recognizing attributes of different scales, since most frameworks use the features of the last layer and ignore the detailed information which is crucial for FAR. To solve this problem, we propose a prior-guided multi-scale fusion transformer, which possesses the ability to build the fusion among features of different scales with prior knowledge of attributes. First, we employ a unifying Graph Convolution Network (GCN) to model the relations between multiple attributes by the prior knowledge of facial labels and the statistical frequencies of co-occurrence between attributes. Second, we propose a multi-scale fusion module, which uses adaptive attention to fuse features from two adjacent layers, and then simultaneously fuse the features of different scales hierarchically to explore the multilevel relation. In addition, we utilize the transformer as a feature extraction module to achieve a global correlation among the acquired features. Experiments on a large-scale face attribute dataset verify the effectiveness of the proposed method both qualitatively and quantitatively.KeywordsFace attribute recognitionMulti-scalePrior-guided

Full Text