Previous research has acknowledged the effect of prosody on inter-gestural coordination, but specifically the effect of tones is still understudied. This paper has a two-fold purpose. First, it aims to explore effects of the Swedish word accents on word-initial consonant-vowel (CV) coarticulation. Second, it aims to revisit the existing evidence for tonal integration. Based on Articulatory Phonology, it has been suggested that tones- in tone languages- are integrated in the gestural organization of a syllable-initial CV sequence in the same manner as would an additional consonant (CCV), indicated by a time lag between the gestural onsets of the C and the V gesture (CV onset time lag). However, we argue that the existing evidence is inconclusive, because previous cross-linguistic research has used small-scale data sets (one to seven speakers), and we still lack a well-grounded consensus on how gestural onsets are to be measured. This study uses Electromagnetic Articulography (EMA) to investigate word-initial CV coordination in a lexical pitch-accent language (Swedish) with a binary tonal word accent distinction: a tonal fall and a tonal rise, respectively. A selection of 13 spatial, temporal or coordinative measures of bilabial and tongue body data from 19 speakers, and acoustic fo data, were examined to study the CV sequence /ma/. Mixed effects regression models revealed a longer tongue body movement in the rising tone context and small but significant differences in tongue body height, in the closing and the opening of the lips, as well as in the CV onset time lag between the two tonal contexts. We argue that these effects are biomechanical in nature, due to the physiological connections between the tongue, the jaw, and the larynx. In addition, our results suggest either synchronized CV onsets or a CV onset time lag (as in tone languages), depending on the timing landmarks used. In order to evaluate such results as evidence for or against the integration of tone in CV coarticulation, we argue that future research needs to compare data from a variety of languages using a considerable number of speakers. The present study provides new reference values for such comparisons.