Abstract

In a distant environment, channel distortion may drastically degrade speech recognition and speaker recognition performances. In this paper, we provide the analysis of effect of compensation parameter estimation for Cepstral Mean Normalization (CMN) on speech/speaker recognition. We first investigate the differences between the intra-speaker variation and the inter-speaker variation by analyzing the cepstrum distances of Japanese vowels. It is indicated that the effect of transmission characteristics compensation on speech recognition task and speaker recognition task is different. Then Position-Dependent Cepstral Mean Normalization (PDCMN) to compensate for channel distortion depending on speaker position is used to evaluate the speech recognition and speaker recognition performances in a distant environment. We conducted the experiments using small vocabulary (100 words) distant isolated word recognition in both simulated and real environments. The results indicate that the proposed PDCMN is more effective for the speaker recognition method than the speech recognition method. We also investigate the effect of experimental environment, the length of utterance and the distance between the sound source and the microphone, etc. on speech/speaker recognition, and discuss the solutions for the degradation caused by various factors. The analysis allows us to decide which recognition method and processing could be effective and necessary for specific recognition task under a certain experimental setup.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call