Combining Evidences from Mel Cepstral and Cochlear Cepstral Features for Speaker Recognition Using Whispered Speech

Aditya Raikar,Hemant A Patil,Ami Gandhi

doi:10.1007/978-3-319-24033-6_46

Abstract

AbstractWhisper is an alternative way of speech communication especially when a speaker does not want to reveal the information other than the target listeners. Generally, speaker-specific information is present in both excitation source and vocal tract system. However, whispered speech does not contain significant source characteristics as there is almost no excitation by the vocal folds, and speaker information in vocal tract system is also low as compared to the normal speech signal. Hence, it is difficult to recognize a speaker from his/her whispered speech. To address this, features based on vocal tract system characteristics such as state-of-the-art Mel Frequency Cepstral Coefficients (MFCC) and recently developed Cochlear Frequency Cepstral Coefficients (CFCC) are proposed. CHAINS (Characterizing individual speakers) whispered speech database is used for conducting experiments using GMM-UBM (Gaussian Mixture Modeling- Universal Background Modeling) approach. It was observed from the experiments that the fusion of CFCC and MFCC gives improvement in % IR (Identification Rate) and % EER (Equal Error Rate) than MFCC alone, indicating that proposed features and their score-level fusion captures complementary speaker-specific information.KeywordsWhisperMFCCCFCCGMM-UBMSource featuresSystem featuresCHAINS corpus

Full Text