Abstract

In this chapter, we propose speaker-specific prosodic features for improving the performance of speaker recognition in noisy environments. This approach can be especially useful in the forensic analysis of speech. Degradation in speaker recognition is a common phenomenon observed due to transmission and channel impairments, microphone variability and background noise. In this work spectral features are used to perform speaker recognition in the first stage and dynamic aspects of speaker-specific prosody are used to improve the performance in the second stage. For this task, speech corpus is collected at Indian Institute of Technology, Kharagpur, using 50 speakers recorded over the mobile phone. Background noise is simulated using additive white random noise from Noisex database. Speech enhancement techniques are used to improve the speaker recognition performance in the case of noisy speech. Gaussian mixture models (GMMs) and support vector machines (SVMs ) are used for developing speaker models. Performance of the speaker recognition system is observed to be 55 and 66% using prosodic and spectral features respectively, for TIMIT speech at 15 dB SNR. . The speaker recognition performance of around 73% is achieved using the combination of spectral and prosodic features for noisy speech after speech enhancement.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call