In this paper, we utilized Hidden Markov Model (HMM) as a mapping mechanism between two different kinds of correlated signals. Specifically, we developed a voice-driven talking head system by exploiting the physical relationships between the shape of the mouth and the sound that is produced. The proposed system can be easily trained and a talking head can be efficiently animated. In the training phase, the Mel-scale Frequency Cepstral Coefficients (MFCC) were analyzed from audio signals and the Facial Animation Parameters (FAP) were extracted from video signals. Then both audio and video features were integrated to train a single HMM. In the synthesis phase, the HMM was used to correlate a completely novel audio track to a FAP sequence for face synthesis with the help of Facial Animation Engine (FAE). The experiments demonstrated the effects of the proposed voice-driven talking head on both man and woman, with two kinds of styles (speaking and singing) and using three kinds of languages (Chinese, English and Taiwanese). The possible applications of the proposed system are computer aided instruction, online guide, virtual conference, lip synchronization, human computer interaction and so on.