Abstract

This paper explores the novel application of two vector quantization algorithms, namely Linde, Buzo, Gray (1980) and K-means algorithm for efficient speaker verification. Automatic speaker verification (ASV) is a memory and compute intensive process, giving rise to area and latency concerns in the way of its implementation for real-time efficient embedded systems. The training schemes for computing the speaker models, such as the expectation maximization are highly iterative and contribute significantly to the overall complexity in the implementation of the system. We demonstrate the use of the LBG and the K-means algorithm to realize compute efficient training method. Models trained with the LBG algorithm achieves as much as 99.88% of EM accuracy, whilst K-means achieves as much as 99.91% of EM accuracy. Moreover, the EM computational complexity is almost twice that of LBG or K-means. Thus, using LBG and K-means algorithms for training Gaussian mixture speaker models for text-independent speaker verification, we show that, that they deliver comparable performance as the EM algorithm at significantly reduced computational complexity. Thus making them an ideal choice for low-cost applications.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call