Local Zernike Moment Representation for Facial Affect Recognition

Evangelos Sariyanidi,Andrea Cavallaro,Muhittin Gökmen,Hatice Gunes

doi:10.5244/c.27.108

Abstract

Local representations became popular for facial affect recognition as they efficiently capture the image discontinuities, which play an important role for interpreting facial actions. We propose to use Local Zernike Moments (ZMs) [4] due to their useful and compact description of the image discontinuities and texture. Their main advantage in comparison to well-established alternatives such as Local Binary Patterns (LBPs) [5], is their flexibility in terms of the size and level of detail of the local description. We introduce a local ZM-based representation which involves a non-linear encoding layer (quantisation). The functionality of this layer is mapping similar facial configurations together and increasing compactness. We demonstrate the use of the local ZM-based representation for posed and naturalistic affect recognition on standard datasets, and show its superiority to alternative approaches for both tasks. Contemporary representations are often designed as frameworks consisting of three layers [2]: (Local) feature extraction, non-linear encoding and pooling. Non-linear encoding aims at enhancing the relevance of local features by increasing their robustness against image noise. Pooling describes small spatial neighbourhoods as single entities, ignoring the precise location of the encoded features, and increasing the tolerance against small geometric inconsistencies. In what follows, we describe the proposed local ZM-based representation scheme in terms of this threelayered framework. Feature Extraction – Local Zernike Moments: The computation of (complex) ZMs can be considered equivalent to representing an image in an alternative space. As shown in Figure 1-a, an image is decomposed onto a set of basis matrices (ZM bases), which are useful for describing the variation at different directions and scales. ZM bases are orthogonal, therefore there is no overlap in the information conveyed by each feature (ZM coefficient). ZMs are usually computed for the entire image, however in this case, ZMs cannot capture the local variation due to ZM bases lacking localisation [3]. In contrary, when computed around local neighbourhoods across the image, they become an efficient tool for describing the image discontinuities which are essential to interpreting facial activity. Non-linear Encoding – Quantisation: We perform quantisation via converting local features into binary values. Such coarse quantisation increases compactness and allows us to code each local block only with a single integer. Figure 1-b illustrates the process of obtaining the Quantised Local ZM (QLZM) image. Firstly, local ZM coefficients are computed across the input image (LZM layer) — each image in the LZM layer (LZM image) contains the features that are extracted through a particular ZM basis. Next, each LZM image is converted into a binary image by quantising each pixel via the signum(·) function. Finally, the QLZM image is obtained by combining all of the binary images. Specifically, each pixel in a particular location of the QLZM image is an integer (QLZM integer), computed by concatenating all of the binary values in the corresponding location of all binary images. The QLZM image is similar to an LBP-transformed image, in the sense that it contains integers of a limited range. Yet, the physical meaning of the information encoded by each integer is quite different. LBP integers describe a circular block by considering only the values along the border, neglecting the pixels that remain inside the block. Therefore, the efficient operation scale of LBPs is usually limited to 3-5 pixels [1, 5]. QLZM integers, on the other hand, describe blocks as a whole, and provide flexibility in terms of operation scale without major loss of information. Pooling – Histograms: Our representation scheme pools encoded features over local histograms. Figure 1-c illustrates the overall pipeline of the proposed representation scheme. Firstly, the QLZM image is computed through the process that is illustrated in detail in Figure 1-b. Next, . . . . . . ... = ZM Coefficients (local features) ZM Bases

Full Text