Face perception is a major topic in vision research. Most previous research has concentrated on (holistic) spatial representations of faces, often with static faces as stimuli. However, faces are highly dynamic stimuli containing important temporal information. How sensitive humans are regarding temporal information in dynamic faces is not well understood. Studies investigating temporal information in dynamic faces usually focus on the processing of emotional expressions. However, faces also contain relevant temporal information without any strong emotional expression. To investigate cues that modulate human sensitivity to temporal order, we utilized muted dynamic neutral face videos in two experiments. We varied the orientation of the faces (upright and inverted) and the presence/absence of eye blinks as partial dynamic cues. Participants viewed short, muted, monochromic videos of models vocalizing a widely known text (National Anthem). Videos were played either forward (in the correct temporal order) or backward. Participants were asked to determine the direction of the temporal order for each video, and (at the end of the experiment) whether they had understood the speech. We found that face orientation, and the presence/absence of an eye blink affected sensitivity, criterion (bias) and reaction time: Overall, sensitivity was higher for upright compared to inverted faces, and in the condition where an eye blink was present compared to the condition without an eye blink. Reaction times were mostly faster in the conditions with higher sensitivity. A bias to report inverted faces as ‘backward’ observed in Experiment I, where upright and inverted faces were presented randomly interleaved within each block, was absent when presenting upright and inverted faces in different blocks in Experiment II. Language comprehension results revealed that there was higher sensitivity when understanding the speech compared to not understanding the speech in both experiments. Taken together, our results showed higher sensitivity with upright compared to inverted faces, suggesting that the perception of dynamic, task-relevant information was superior with the canonical orientation of the faces. Furthermore, partial information coming from eye blinks, in addition to mouth movements, seemed to play a significant role in dynamic face perception, both when faces were presented upright and inverted. We suggest that studying the perception of facial dynamics beyond emotional expressions will help us to better understand the mechanisms underlying the temporal integration of facial information from different -partial and holistic- sources, and that our results show how different strategies, depending on the available information, are employed by human observers when judging the temporal order of faces.