Significance of constraining text in limited data text-independent speaker verification

Rohan Kumar Das,S R Mahadeva Prasanna,Sarfaraz Jelil

doi:10.1109/spcom.2016.7746659

Abstract

This work projects the importance of phonetic match between train and test session for a text-independent framework under limited test data condition. The robustness of text-independent speaker verification (SV) tends to fall down with the reduction of the amount of speech involved. From a deployable application oriented system point of view, the amount of speech involved, is expected to be less to ensure user comfort. Keeping this as a priority and based on the literature studies in this direction, a framework is proposed for the development of a text constrained text-independent SV system that emulates the anatomy of the text-dependent framework. This framework recommends having a text constrained speaker model developed using limited data of around 10 sec. The same content is spoken by the user during testing. A baseline system is built over a data collected in a practical scenario where sufficient train and limited test data is used. On evaluating the two systems over i-vector based SV system, the text constrained model based topology is found to work exceedingly well as compared to the conventional method under limited data condition. Further, it was observed that having the phonetic content of the test session in the training session helps in improving the baseline system performance.

Full Text