Abstract

We adopt a Dirichlet process Gaussian mixture model (DPGMM) for unsupervised acoustic modeling and represent speech frames with Gaussian posteriorgrams. The model performs unsupervised clustering on untranscribed data, and each Gaussian component can be considered as a cluster of sounds from various speakers. The model infers its model complexity (i.e. the number of Gaussian components) from the data. For computation efficiency, we use a parallel sampler for the model inference. Our experiments are conducted on the corpus provided by the zero resource speech challenge. Experimental results show that the unsupervised DPGMM posteriorgrams obviously outperformMFCC, and perform comparably to the posteriorgrams derived from language-mismatched phoneme recognizers in terms of the error rate of ABX discrimination test. The error rates can be further reduced by the fusion of these two kinds of posteriorgrams.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.