Abstract

Combined electric and acoustic stimulation (EAS) has demonstrated better speech recognition than conventional cochlear implant (CI) and yielded satisfactory performance under quiet conditions. However, when noise signals are involved, both the electric signal and the acoustic signal may be distorted, thereby resulting in poor recognition performance. To suppress noise effects, speech enhancement (SE) is a necessary unit in EAS devices. Recently, a time-domain speech enhancement algorithm based on the fully convolutional neural networks (FCN) with a short-time objective intelligibility (STOI)-based objective function (termed FCN(S) in short) has received increasing attention due to its simple structure and effectiveness of restoring clean speech signals from noisy counterparts. With evidence showing the benefits of FCN(S) for normal speech, this study sets out to assess its ability to improve the intelligibility of EAS simulated speech. Objective evaluations and listening tests were conducted to examine the performance of FCN(S) in improving the speech intelligibility of normal and vocoded speech in noisy environments. The experimental results show that, compared with the traditional minimum-mean square-error SE method and the deep denoising autoencoder SE method, FCN(S) can obtain better gain in the speech intelligibility for normal as well as vocoded speech. This study, being the first to evaluate deep learning SE approaches for EAS, confirms that FCN(S) is an effective SE approach that may potentially be integrated into an EAS processor to benefit users in noisy environments.

Highlights

  • A COCHLEAR implant (CI) is a surgically implanted electronic device that stimulates auditory nerves to provide a sense of sound for people with severe-to-profound sensorineural hearing loss

  • This study is the first to investigate the effectiveness of deeplearning-based speech enhancement (SE) methods on electric and acoustic stimulation (EAS) simulated speech

  • We focused on comparisons between the recently developed fully convolutional neural networks (FCN)(S) SE approach, a conventional minimum-mean square-error (MMSE) SE approach, and deep-learning-based deep denoising autoencoder (DDAE) SE approach at two different SNRs in engine and street noisy environments

Read more

Summary

INTRODUCTION

A COCHLEAR implant (CI) is a surgically implanted electronic device that stimulates auditory nerves to provide a sense of sound for people with severe-to-profound sensorineural hearing loss. Fu et al [63] proposed the use of a fully convolutional neural network (FCN) model for SE in the time domain, which can preserve the neighbouring information of a speech waveform to generate highand low-frequency components Their experimental results show that, compared with CNN and deep neural networks, the FCN model yields better speech intelligibility in terms of short-time objective intelligibility (STOI) with fewer parameters. Experimental results confirmed that the DDAE-based method outperforms three commonly used single-microphone SE approaches (logMMSE, KLT, and Wiener filter) in terms of intelligibility, evaluated with STOI, and speech recognition, evaluated with listening tests These results confirmed the potential of applying deep learning models to improve CI devices.

VOCODED SPEECH
EXPERIMENTAL SETUP AND RESULTS
Evaluation on Normal Speech
Evaluation on Vocoded Speech
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call