Information theoretic factorization of speaker and language in hidden Markov models, with application to speaker recognition

N Tishby

doi:10.1109/icassp.1988.196517

Abstract

An information theoretic approach to speech modeling with prior statistical knowledge is proposed. Using the concept of minimum discrimination information (MDI), a model of speech can be factored into a prior distribution and an exponential correction term, depending on the specific training data. The discrimination information measures the statistical deviations of the training data from a prior model, in a way that is known to be optimal in a well defined sense. The minimization of the discrimination information, subject to the given training data as constraints, yields a set of Lagrange multipliers. These multipliers serve to characterize the part of the training data which is not described by the prior model. The problem of separating the speaker dependent part from a 'universal' speaker independent prior in hidden Markov models is studied in this framework and a practical method for achieving this separation is derived. As an example, universal hidden Markov priors for isolated English digits are trained for male and female speakers using a database of 100 speakers and 20000 spoken digits. The speaker specific part is modeled by the individual Lagrange multipliers obtained by minimizing the discrimination information between the training data and the corresponding prior language model. >

Full Text