Abstract
Abstract In this paper, we continue our investigation of the warpeddiscrete cosine transform cepstrum (WDCTC), which wasearlier introduced as a new speech processing feature [1].Here, we study the statistical properties of the WDCTC andcompare them with the mel-frequency cepstral coefficients(MFCC). We report some interesting properties of the WD-CTC when compared to the MFCC: its statistical distrib-ution is more Gaussian-like with lower variance, it obtainsbettervowel clusterseparability,it formstightervowelclus-ters and generates better codebooks. Further, we employthe WDCTC and MFCC features in a 5-vowel recognitiontask using Vector Quantization (VQ) and 1-Nearest Neigh-bour(1-NN)as classifiers. Inourexperiments,the WDCTCconsistently outperforms the MFCC. 1. Introduction We recently introduced the warped discrete cosine trans-form cepstrum (WDCTC) as a new speech processing fea-ture and demonstrated its better performance than the mel-frequency cepstral coefficients (MFCC) in a vowel recog-nition and speaker-identification task [1]. The WDCTC hasshown good promise as a speech processing feature and weare encouraged to further investigate the WDCTC featureand its statistical properties.Alargevolumeoftrainingdataisrequiredto buildspea-ker-independentspeechrecognitionsystems. Onetechniqueof reducing the data size is clustering the data and choos-ing a reasonable number of representative feature vectorsto form codebooks [2]. Hence, codebook techniques arevery relevant and practical to speech recognition systems.We form WDCTC and MFCC codebooks using a k-meansclustering algorithm and compare the codebook statisticsfor clean and noisy vowels using the coefficient of varianceand overlap ratio (defined later). Our experiment demon-strates that the WDCTC codebooks represent the underly-ing vowel data better than MFCC.In order to compare the classification capability of thefeatures, the WDCTC and MFCC are employed in a 5-vowel recognition task. Vector quantization (VQ) and 1-nearestneighbor(1-NN,[2])are usedas classifiersandtheirrecognition performance is reported. We also investigatethe clean and noisy vowel clusters formed by WDCTC andMFCC features and present the average separability of thevowel classes.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.