Voice and face processing occur through convergent neural systems that facilitate speaker recognition. Neuroimaging studies suggest that familiar voice processing engages early visual cortex, including the bilateral fusiform gyrus (FG) on the basal temporal lobe. However, what role the FG plays in voice processing and whether it is driven by bottom-up or top-down mechanisms is unresolved. In this study we directly examined neural responses to famous voices and faces in human FG with direct cortical surface recordings (electrocorticography) in epilepsy surgery patients. We tested the hypothesis that neural populations in human FG respond to famous voices and investigated the temporal properties of voice responses in FG. Recordings were acquired from five adult participants during a person identification task using visual and auditory stimuli from famous speakers (U.S. Presidents Barack Obama, George W. Bush, and Bill Clinton). Patients were presented with images of presidents or clips of their voices and asked to identify the portrait/speaker. Our results demonstrate that a subset of face-responsive sites in and near FG also exhibit voice responses that are both lower in magnitude and delayed (300-600 ms) compared with visual responses. The dynamics of voice processing revealed by direct cortical recordings suggests a top-down feedback-mediated response to famous voices in FG that may facilitate speaker identification.NEW & NOTEWORTHY Interactions between auditory and visual cortices play an important role in person identification, but the dynamics of these interactions remain poorly understood. We performed direct brain recordings of fusiform face cortex in human epilepsy patients performing a famous voice naming task, revealing the dynamics of famous voice processing in human fusiform face cortex. The findings support a model of top-down interactions from auditory to visual cortex to facilitate famous voice recognition.