Age estimation of face images is a crucial task with various practical applications in areas such as video surveillance and Internet access control. While deep learning-based age estimation methods like Convolutional Neural Networks (CNN), Multilayer Perceptrons (MLP), and Transformers have shown remarkable performance, they have limitations when modeling complex or irregular objects, and often contain redundant information. Although existing graph neural networks (GNNs) can alleviate this issue, they only considers the structural information of the image and ignores the semantic features, thus the feature representation capability of graph nodes is limited. To address these issues, this paper proposes a novel Masked Contrastive Graph Representation Learning (MCGRL) method for accurate age estimation of face images. Our approach leverages CNN to extract semantic features of the image, which are then partitioned into patches that serve as nodes in the graph. We use a masked graph convolutional network (GCN) to derive image-based node representations that capture rich structural information. Finally, we incorporate multiple losses to explore the complementary relationship between structural information and semantic features, which improves the feature representation capability of GCN. Experimental results on real-world face image datasets demonstrate the superiority of our proposed method over other state-of-the-art age estimation approaches. Our code is available at https://github.com/yuntaoshou/MCGRL
Read full abstract