Abstract

In this paper, we extend the Hierarchical Mixture of Experts (HME) to temporal processing and explore it for a substantial problem, that of text-dependent speaker identification. For a specific multiway classification, we propose a generalized Bernoulli density instead of the multinomial logit density to avoid the instability during training. Time-delay technique is applied for spatio-temporal processing in the HME and a combining scheme is presented for combining multiple time-delay HMEs in order to complete a multi-scale analysis for the temporal data. Using the time-delay HME along with the EM algorithm as well as the combination of multiple time-delay HMEs, the speaker identification system has a good performance and yields significantly fast training. We have also addressed some issues about the time-delay techniques in the HME.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.