Abstract

In a previous report [VanDam and Silbert (2013) POMA 19, 060006], we investigated performance of a commercially available automatic speech recognition (ASR) system [LENA Research Foundation, Boulder, CO] on acoustic recordings from family speech in naturalistic environments. We found that the ASR more accurately labeled children over adults and fathers over mothers, and human judge labels included substantial individual variation. The present work extends previous work by investigating the possible sources for both machine- and human labeling decisions. Classification tree models were fit to several acoustic variables for machine- and human labels of CHILD, MOTHER, and FATHER. Results suggest that (a) fundamental frequency (f 0) and duration measures influenced label assignment for both machine and human classifications, (b) the error of the fitted models is lower for the machine labeling procedure than for human judges, (c) machine- and human decision processes use the acoustic criteria (i.e., f 0 and duration) differently, and (d) f 0 is more important than duration for all labelers. Results may have implications for improving implementation and interpretation of ASR techniques, especially as they are useful for understanding child language applications and very large, naturalistic datasets that demand unsupervised ASR techniques.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call