Abstract
Vocal Tract Length Normalization (VTLN) is a very important speaker normalization technique for speech recognition tasks. In this paper, we propose the use of Gaussian posteriorgram of VTLN-warped spectral features for a Query-by-Example Spoken Term Detection (QbE-STD). This paper presents the use of a Gaussian Mixture Model (GMM) framework for estimation of VTLN warping factor. This GMM framework does not require phoneme-level transcription and hence, it can be useful for unsupervised tasks. We propose the iterative approach for VTLN warping factor estimation with two GMM training approaches, namely, Expectation-Maximization (EM) and Deterministic Annealing-Expectation Maximization (DAEM). The VTLN-warped Gaussian posteriorgram gave the better QbE-STD performance. The performance of TIMIT QbE-STD was investigated with different evaluation factors, such as a number of Gaussian components in GMM, various local constraints, and a number of iterations in VTLN warping factor estimation. VTLN-warped Gaussian posteriorgram reduces the speaker-specific variation in Gaussian posteriorgram and hence, it is expected to give better performance than Gaussian posteriorgram.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.