Learnable pooling weights for facial expression recognition

M Amine Mahmoudi,Aladine Chetouani,Fatma Boufera,Hedi Tabia

doi:10.1016/j.patrec.2020.09.001

Abstract

Abstract Pooling layers are spatial down-sampling layers used in convolutional neural networks (CNN) to gradually downscale the feature map, increase the receptive field size and reduce the number of the parameters in the model. The use of pooling layers leads to less computing complexity and memory consumption reduction but also introduces invariance to certain filter distortions which may induce subtle detail loss. This behaviour is undesired for some fine-grained recognition tasks such as facial expression recognition (FER) which highly relies on specific regional distortion detection. In this paper, we introduce a more filter distortion aware pooling layer based on kernel functions. The proposed pooling reduces the feature map dimensions while keeping track of the majority of the information fed to the next layer instead of ignoring part of them. The experiments on RAF, FER2013 and ExpW databases demonstrate the benefits of such layer and show that our model achieves competitive results with respect to the state-of-the-art approaches.

Full Text