Human age estimation from a single image is a quite challenging task due to the subtle appearance change in the slow aging process. In this article, we propose a compact multi-attention deep network for age estimation based on the idea of fine-grained learning and visual attention mechanism. Concerning the problem that age estimation is a fine-grained visual classification problem, it relies on not only the global features of the face image, but also the fine-grained feature representations from age-sensitive local regions. Therefore, accurate age estimation benefits from multi-scale features and their fusion. Therefore, in this article, a multi-attention model built on a complementary two-stream compact network is proposed for age estimation. For a given intermediate feature map from the network, spatial attentions and channel attentions can be inferred in both self-attention and mutual-attention way. To emphasize crucial features from age-sensitive regions, the multi-attention maps are then multiplied to the input feature map for adaptive feature refinement. Finally, the refined feature maps at multiple layers are aggregated as the fine-grained feature for age estimation. Compared to bulky models, our model is compact and end-to-end. However, the performance of our model is competitive compared with those state-of-the-art methods.
Read full abstract