ABSTRACTThree experiments are reported which examined the capacity to match a voice with a static image of a face. When using a simultaneous same/different matching task, performance was significantly better than chance (Experiments 1 and 2). However, it did not appear to depend either on sex of speaker, sex of listener, stimulus distinctiveness, or self-reported strategies (Experiment 2). Concerns over floor effects as well as a significant response bias prompted a change of task, and when performance was examined through matching a voice to a face lineup, a more interesting pattern emerged. Again, performance was significantly better than chance, but in addition, it was demonstrably affected by the distinctiveness of the speaker’s voice. These results are considered in the context of theoretical discussions regarding face–voice integration, and in the context of more applied considerations regarding multimodal benefits in witness scenarios.