Abstract

This paper studies the contribution of different phones in speech data towards improving the performance of text/language independent speaker recognition systems. This work is motivated by the fact that the removal of silence segments from the speech data improves the system performance significantly as it does not contain any speaker-specific information. It is also clear from the literature that not all the phones in the speech data contains equal amount of speaker-specific information in it and the performance of the speaker recognition systems depends on this information. In addition to the silence segments, our work empirically finds 18 other diluent phones that has minimum speaker discrimination capability. We propose to use a preprocessing stage that identifies all non-informative set of phones recursively and removes them along with silence segments. Results show that using phones removed preprocessed data in state-of-the-art i-vector system outperforms the baseline i-vector system. We report absolute improvements of 1%, 1%, 2%, 2% and 1% in EER for test set collected through channels of Digital Voice Recorder, Headset, Mobile Phone 1, Mobile Phone 2 and Tablet PC respectively on IITG-MV database.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call