Abstract

There are growing physiological and practical evidences that show usefulness of component (e.g., local feature) based approaches in generic object recognition (Matsugu & Cardon, 2004; Wolf et al., 2006; Mutch & Lowe, 2006; Serre et al., 2007) which is robust to variability in appearance due to occlusion and to changes in pose, size and illumination. It is no doubt clear that low level features such as edges are important and utilized in most of visual recognition tasks. However, there are only a few studies that address economical and efficient use of intermediate visual features for higher level cognitive function (Torralba et al., 2004; Opelt et al., 2006). In this chapter, inspired by cortical processing, we will address the problem of efficient selection and economical use of visual features for face recognition (FR) as well as facial expression recognition (FER). We demonstrate that by training our previously proposed (Matsugu et al., 2002) hierarchical neural network architecture (modified convolutional neural networks: MCoNN) for face detection (FD), higher order visual function such as FR and FER can be organized for shared use of such local features. The MCoNN is different from those previously proposed networks in that training is done layer by layer for intermediate as well as global features with resulting receptive field size of neurons being larger for higher layers. Higher level (e.g., more complex) features are defined in terms of spatial arrangement of lower level local features in a preceding layer. In the chapter, we will define a common framework for higher level cognitive function using the same network architecture (i.e., MCoNN) as substrate as follows. • In Section 2, we will demonstrate two examples of learning local features suitable for FD in our MCoNN (Matsugu & Cardon, 2004). One approach is heuristic, supervised training by showing exemplar local features or patches of images, and the other is unsupervised training using SOM (self-organizing map) combined with supervised training in MCoNN. • In the proposed framework, both FR and FER utilize common local features (e.g., corner like end-stop structures) learnt from exemplary image fragments (e.g., mouth corners, eye-corners) for FD. Specifically, in Section 3, spatial arrangement information of such local features is extracted implicitly for FR as feature vectors used in SVM classifiers (Matsugu et al., 2004). In the case of FER described in Section 4, spatial arrangement of

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call