Abstract

Speech communication commonly occurs in the presence of multiple, non-target talkers. Previous work has shown that the amount of glimpsed target speech is a good predictor of overall intelligibility of babble-masked speech (Brungart et al., 2006, JASA) and that an automatic speech recognition system trained on glimpses closely approximates listener accuracy as a function of the number of babble talkers (Cooke, 2006, JASA). The present work uses a number of machine learning models to analyze the auditory information available in babble-masked speech and in available glimpses of babble-masked speech. Regularized Linear Discriminant, Support Vector Machine, and Naive Bayes classifiers were fit to modeled auditory representations of babble-masked CV syllables (with C = t, d, s, z, and V = a). The machine learning models outperformed human listeners substantially (>70% versus 54% accuracy, respectively). Analysis of predicted confusion patterns indicates that the Naive Bayes model most closely approximates human error patterns. The effects of variation in the local and absolute thresholds for glimpse calculation are explored with respect to overall accuracy and error pattern prediction in these machine learning models.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.