Recent psychological and neural studies indicate that when people talk their changing facial expressions and head movements provide a dynamic cue for recognition. Therefore, both fixed facial features and dynamic personal characteristics are used in the human visual system (HVS) to recognize faces. However, most automatic recognition systems use only the static information as it is unclear how the dynamic cue can be integrated and exploited. The few works attempting to combine facial structure and its dynamics do not consider the relative importance of these two cues. They rather combine the two cues in an adhoc manner. But what is the relative importance of these two cues separately? Does combining them enhance systematically the recognition performance? To date, no work has extensively studied these issues. In this article, we investigate these issues by analyzing the effects of incorporating the dynamic information in video-based automatic face recognition. We consider two factors (face sequence length and image quality) and study their effects on the performance of video-based systems that attempt to use a spatio-temporal representation instead of one based on a still image. We experiment with two different databases and consider HMM (the temporal hidden Markov model) and ARMA (the auto-regressive and moving average model) as baseline methods for the spatio-temporal representation and PCA and LDA for the image-based one. The extensive experimental results show that motion information enhances also automatic recognition but not in a systematic way as in the HVS.
Read full abstract