Acoustic vocoders play a key role in simulating the speech information available to cochlear implant (CI) users. Traditionally, the intelligibility of vocoder CI simulations is assessed through speech recognition experiments with normally-hearing subjects, a process that can be time-consuming, costly, and subject to individual variability. As an alternative approach, we utilized an advanced deep learning speech recognition model to investigate the intelligibility of CI simulations. We evaluated model’s performance on vocoder-processed words and sentences with varying vocoder parameters. The number of vocoder bands, frequency range, and envelope dynamic range were adjusted to simulate sound processing settings in CI devices. Additionally, we manipulated the low-cutoff frequency and intensity quantization of vocoder envelopes to simulate psychophysical temporal and intensity resolutions in CI patients. The results were evaluated within the context of the audio analysis performed in the model. Interestingly, the deep learning model, despite not being originally designed to mimic human speech processing, exhibited a human-like response to alterations in vocoder parameters, resembling existing human subject results. This approach offers significant time and cost savings compared to testing human subjects, and eliminates learning and fatigue effects during testing. Our findings demonstrate the potential of speech recognition models in facilitating auditory research.
Read full abstract