The majority of the world’s languages exploit consonants, vowels and lexical tones to contrast the meanings of individual words. However, the majority of experimental research on early language development focuses on consonant–vowel languages. In the present study, the role of consonants, vowels and lexical tones in emergent word knowledge are directly compared in toddlers (2.5–3.5 years) and preschoolers (4–5 years) who were bilingual native learners of a consonant–vowel–tone language (Mandarin Chinese). Using a preferential looking paradigm, participants were presented with correct pronunciations and consonantal, vowel, and tonal variations of known words. Responses to each type of variation were assessed via gaze fixations to a visual target. When their labels were correctly pronounced, visual targets were reliably identified at both age groups. However, in toddlers, there was a high degree of sensitivity to mispronunciations due to variation in lexical tones relative to those due to consonants and vowels. This pattern was reversed in preschoolers, who were more sensitive to consonant and vowel variation than to tone variation. Findings are discussed in terms of properties of tones, vowels and consonants and the respective role of each source of variation in tone languages.