Body gestures are an important non-verbal expression channel during affective communication. They convey human attitudes and emotions as they dynamically unfold during an interpersonal interaction. Hence, it is highly desirable to understand the dynamics of body gestures associated with emotion expression in human interactions. We present a statistical framework for robustly modeling the dynamics of body gestures in dyadic interactions. Our framework is based on high-level semantic gesture patterns and consists of three components. First, we construct a universal background model (UBM) using Gaussian mixture modeling (GMM) to represent subject-independent gesture variability. Next, we describe each gesture sequence as a concatenation of semantic gesture patterns which are derived from a parallel HMM structure. Then, we probabilistically compare the segments of each gesture sequence extracted from the second step with the UBM obtained from the first step, in order to select highly probabilistic gesture patterns for the sequence. The dynamics of each gesture sequence are represented by a statistical variation profile computed from the selected patterns, and are further described in a well-defined kernel space. This framework is compared with three baseline models and is evaluated in emotion recognition experiments, i.e., recognizing the overall emotional state of a participant in a dyadic interaction from the gesture dynamics. The recognition performance demonstrates the superiority of the proposed framework over the baseline models. The analysis of the relationship between the emotion recognition performance and the number of the selected segments also indicates that a few local salient events, rather than the whole gesture sequence, are sufficiently informative to trigger the human summarization of their overall global emotion perception.