Leveraging natural language processing models to automate speech-intelligibility scoring

Björn Herrmann

doi:10.1080/2050571x.2024.2374160

Björn Herrmann

Open Access

PDF Available

https://doi.org/10.1080/2050571x.2024.2374160

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

ABSTRACT Assessment of speech intelligibility in noise is critical for measuring the impact of age-related hearing loss. However, quantifying intelligibility often requires a human to manually process responses provided by a participant or patient to obtain a speech-intelligibility score – typically the proportion of correctly heard words. This manual process can be time-consuming and thus costly. The current study investigates whether state-of-the-art Natural Language Processing (NLP) models from Google and OpenAI could be used to calculate speech-intelligibility scores as an alternative to human scoring. It was specifically tested whether NLP models capture common speech-in-noise perception phenomena in younger and older adults (N = 144) listening to speech masked by modulated or unmodulated babble noise. The results show that NLP speech-intelligibility scores closely matched intelligibility scores from a human scorer (r ∼0.95). The main difference is, on average, ∼2% underestimation of NLP intelligibility scores relative to human intelligibility scores for moderate to high signal-to-noise ratios. This underestimation results from participants making minor errors related to misspellings, gender, or tense, to which NLP models are sensitive, but human scorers typically correct prior to scoring. Critically, NLP models capture the known age-related reduction in intelligibility and the age-related reduction in the benefit from a modulated relative to an unmodulated masker. OpenAI’s ADA2 appears to perform the best out of the tested NLP models, showing no difference in the speech-in-noise phenomena compared to human scoring. The current study suggests that modern NLP models can be used to score speech-intelligibility data.

Full Text