Mechanisms by which the brain could perform invariant recognition of objects including faces are addressed neurophysiologically, and then a computational model of how this could occur is described. Some neurons that respond primarily to faces are found in the macaque cortex in the anterior part of the superior temporal sulcus (in which region neurons are especially likely to be tuned to facial expression, and to face movement involved in gesture). They are also found more ventrally in the TE areas which form the inferior temporal gyrus. Here the neurons are more likely to have responses related to the identity of faces. These areas project on to the amygdala and orbitofrontal cortex, in which face-selective neurons are also found. Quantitative studies of the responses of the neurons that respond differently to the faces of different individuals show that information about the identity of the individual is represented by the responses of a population of neurons, that is, ensemble encoding is used. The rather distributed encoding (within the class faces) about identity in these sensory cortical regions has the advantages of maximising the information in the representation useful for discrimination between stimuli, generalisation, and graceful degradation. In contrast, the more sparse representations in structures such as the hippocampus may be useful to maximise the number of different memories stored. There is evidence that the responses of some of these neurons are altered by experience so that new stimuli become incorporated in the network, in only a few seconds of experience with a new stimulus. It is shown that the representation that is built in temporal cortical areas shows considerable invariance for size, contrast, spatial frequency and translation. Thus the representation is in a form which is particularly useful for storage and as an output from the visual system. It is also shown that one of the representations which is built is view-in-variant, which is suitable for recognition and as an input to associative memory. Another is viewer-centered, which is appropriate for conveying information about gesture. It is shown that these computational processes operate rapidly, in that in a backward masking paradigm, 20–40 ms of neuronal activity in a cortical area is sufficient to support face recognition. In a clinical application of these findings, it is shown that humans with ventral frontal lobe damage have in some cases impairments in face and voice expression identification. These impairments are correlated with and may contribute to the problems some of these patients have in emotional and social behaviour. To help provide an understanding of how the invariant recognition described could be performed by the brain, a neuronal network model of processing in the ventral visual system is described. The model uses a multistage feed-forward architecture, and is able to learn invariant representations of objects including faces by use of a Hebbian synaptic modification rule which incorporates a short memory trace (0.5 s) of preceding activity to enable the network to learn the properties of objects which are spatio-temporally invariant over this time scale.