Abstract

Cepstrum mean normalization is an effective method for recognizing distorted telephone speech. This method compensates for the difference of bias on cepstrum coefficients (CC) between training data and test data by subtracting the mean value of the CC calculated from a certain amount of given speech data. Such adaptation data are not phonetically balanced, which makes it difficult to get an accurate mean value for the CC. In this paper, a new approach to resolve this problem is proposed. Before recognizing speech, not only the mean value of the adaptation data itself must be calculated, but also the mean value from Gaussian distribution of continuous density HMMs must be calculated, whose phonemes appear in the adaptation data. When recognition occurs, the difference of these two mean values is subtracted from speech data. This proposed method using various telephone speech data has been investigated, i.e., speech data from an ordinal analog telephone, a code-less handset telephone, and a digital cellular phone (Japanese full-rate digital phone based on VSELP), which was recorded through the public switched telephone network. These experimental results show an advantage of the new method.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.