Adolescents self-report using different strategies to respond to peer provocation. However, we have a limited understanding of how these responses are behaviorally enacted and perceived by peers. This study examined the extent to which adolescents' self-reported responses to peer provocation (i.e., aggressive, assertive, and withdrawn) predicted how their vocal enactments of standardized responses to peer provocation were perceived by other adolescents. Three vocal cues relevant to the communication of emotional intent-average pitch, average intensity, and speech rate-were explored as moderators of these associations. Adolescent speakers (n = 39; Mage = 12.67; 66.7% girls) completed a self-report measure of how they would choose to respond to scenarios involving peer provocation; they also enacted standardized vocal responses to hypothetical peer provocation scenarios. Recordings of speakers' vocal responses were presented to a separate sample of adolescent listeners (n = 129; Mage = 12.12; 52.7% girls) in an online listening task. Speakers who self-reported greater use of assertive response strategies enacted standardized vocal responses that were rated as significantly friendlier by listeners. Vocal responses enacted with faster speech rates were also rated as significantly friendlier by listeners. Speakers' self-reported use of aggression and withdrawal was not significantly related to listeners' ratings of their standardized vocal responses. These findings suggest that adolescents may be perceived differently by their peers depending on the way in which their response is enacted; specifically, faster speech rate may be perceived as friendlier and thus de-escalate peer conflict. Future studies should consider not only what youth say and/or do when responding to peer provocation but also how they say it.