This study addresses the question of whether oral proficiency in Japanese second language (L2) speech has a unique correlation with acoustic characteristics of rhythm production that is independent from segments. Among four rhythm measures used (V%, VarcoV, VarcoC, VI‐M), only two measures were different for the spontaneous L2 Japanese speech of beginning and intermediate learners. The interlanguage rhythm of less proficient speakers of Japanese was characterized by lower variability in duration of vocalic stretches (VarcoV) and higher variability in duration of consonantal stretches (VarcoC), p < 0.05. For both VarcoV and VarcoC values, the distribution of the individual speakers’ rhythm scores was much tighter and on target for the intermediate students than for the beginning students. Furthermore, VarcoV values were significantly correlated with number of utterance‐final vowels, and VarcoC values were correlated with number of obstruent clusters. In sum, the findings suggest that rhythmic differences in spontaneous L2 speech have an epiphenomenal nature stemming from the segmental structure of Japanese: the acquisition of mora‐timed rhythm by learners of Japanese seems to be contingent upon the target‐like production of segments which varies with proficiency level of learners.