Abstract

Bone-conducted (BC) speech potentially avoids the undesired effects on recorded speech due to background noise or reverberation; however, BC speech has lower quality and intelligibility than air-conducted (AC) speech. Since a large-scale BC speech database is hard to obtain (low-resource), current BC speech enhancement methods hardly improve the speech of speakers outside the training dataset. We proposed a method for enhancing BC speech from speakers outside of the training dataset in such a low-resource scenario. The proposed method contained a feature conversion model based on a vector-quantized variational autoencoder incorporating the gammachirp filterbank cepstral coefficients. The proposed method exploited the large-scale clean AC speech database to improve the quality of the BC speech. We conducted three evaluations to determine the effectiveness of the proposed method: perceptual evaluation of speech quality, short-time objective intelligibility, and the syllable error rate of the automatic speech recognition system. The results indicated that the proposed method could improve the sound quality and intelligibility of the BC speech from speakers outside of the training dataset.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call