We investigated how a listener's perceived meaning of a spoken sentence is influenced by the relative timing between a speaker's speech and accompanying hand gestures. Participants viewed a computer-animated character who uttered the phrase, “Put the book there now.” while executing a simple right-handed beat gesture whose location relative to the utterance was precisely controlled in a frame-by-frame fashion. The participant's task consisted of making a judgment about two related aspects of the actor's perceived speech: (a) Which word was emphasized? and (b) How clear was the emphasis? That is, did it make sense? The results revealed that the perceived emphasis was determined by the timing (phasing) of the speaker's hand gesture. Furthermore, the clarity of the perceived emphasis (i.e., meaningfulness) was influenced by the affordances in the immediate environment of the speaker. Discussion addresses the primacy of ostensive specification and gesture in communicative events, the dynamics of speech-hand coordination during both actual and virtual dialogue, and the role of environmental affordances in grounding informative communicative acts in the ecology of organism-environment dynamics.