Vocal fold control was critical to the evolution of spoken language, much as it today allows us to learn vowel systems. It has, however, never been demonstrated directly in a non-human primate, leading to the suggestion that it evolved in the human lineage after divergence from great apes. Here, we provide the first evidence for real-time, dynamic and interactive vocal fold control in a great ape during an imitation “do-as-I-do” game with a human demonstrator. Notably, the orang-utan subject skilfully produced “wookies” – an idiosyncratic vocalization exhibiting a unique spectral profile among the orang-utan vocal repertoire. The subject instantaneously matched human-produced wookies as they were randomly modulated in pitch, adjusting his voice frequency up or down when the human demonstrator did so, readily generating distinct low vs. high frequency sub-variants. These sub-variants were significantly different from spontaneous ones (not produced in matching trials). Results indicate a latent capacity for vocal fold exercise in a great ape (i) in real-time, (ii) up and down the frequency spectrum, (iii) across a register range beyond the species-repertoire and, (iv) in a co-operative turn-taking social setup. Such ancestral capacity likely provided the neuro-behavioural basis of the more fine-tuned vocal fold control that is a human hallmark.