Stereopsis is a powerful depth cue for humans, which may also contribute to object recognition. In particular, we surmise that face identification would benefit from the availability of stereoscopic depth cues, since facial perception may be based on three-dimensional (3D) representations. In this study, a virtual reality (VR) headset with integrated eye-tracking was used to present stereoscopic images of faces. As a monoscopic contrast condition, identical images of faces were displayed to the two eyes. We monitored the participants' gaze behavior and pupil diameters while they performed a sample-to-match face identification task. We found that accuracy was superior in the stereoscopic condition compared to the monoscopic condition for frontal and intermediate views, but not profiles. Moreover, pupillary diameters were smaller when identifying stereoscopically seen faces than when viewing them without stereometric cues, which we interpret as lower processing load for the former than the latter conditions. The analysis of gaze showed that participants tended to focus on regions of the face rich in volumetric information, more so in the stereoscopic condition than the monoscopic condition. Together, these findings suggest that a 3D representation of faces may be the natural format used by the visual system when assessing face identity. Stereoscopic information, by providing depth information, assists the construction of robust facial representations in memory.