The /ɹ/ productions of young children acquiring American English are highly variable and often inaccurate, with [w] as the most common substitution error. One acoustic indicator of the goodness of children's /ɹ/ productions is the difference between the frequency of the second formant (F2) and the third formant (F3), with a smaller F3-F2 difference being associated with a perceptually more adultlike /ɹ/. This study analyzed the effectiveness of automatically extracted F3-F2 differences in characterizing young children's productions of /ɹ/-/w/ in comparison with manually coded measurements. Automated F3-F2 differences were extracted from productions of a variety of different /ɹ/- and /w/-initial words spoken by 3- to 4-year-old monolingual preschoolers (N = 117; 2,278 tokens in total). These automated measures were compared to ratings of the phoneme goodness of children's productions as rated by untrained adult listeners (n = 132) on a visual analog scale, as well as to narrow transcriptions of the production into four categories: [ɹ], [w], and two intermediate categories. Data visualizations show a weak relationship between automated F3-F2 differences with listener ratings and narrow transcriptions. Mixed-effects models suggest the automated F3-F2 difference only modestly predicts listener ratings (R 2 = .37) and narrow transcriptions (R 2 = .32). The weak relationship between automated F3-F2 difference and both listener ratings and narrow transcriptions suggests that these automated acoustic measures are of questionable reliability and utility in assessing preschool children's mastery of the /ɹ/-/w/ contrast.