Abstract

Speaker classification is seen as a hypothesis testing problem of J simple hypotheses and a composite hypothesis. The simple hypotheses represent target speakers while the composite hypothesis represents nontarget speakers. The simple hypotheses have well-defined distributions that are estimated from training signals. The distribution of the signal under the composite hypothesis is assumed to belong to a given family. The parameter of that distribution is assumed random with a prior distribution that is estimated from a large set of speakers. This formulation converts the problem to that of testing J+1 simple hypotheses. Signals corresponding to target and nontarget speakers are assumed Gaussian mixtures processes. Once the system has been trained, list decoding is applied in which a test signal is associated with a list of possible speakers. The probability that the correct speaker is on the list is maximized for a given average number of incorrect speakers on the list. Results from speaker identification and speaker verification experiments are reported. In speaker identification using a National Institute of Standards and Technology (NIST) database with 174 target speakers, over 77% correct identification was achieved for an average of less than two erroneous speakers on the list. Speaker verification experiments on a similar database yielded results, expressed in terms of the equal-error-rate, of 6.7% and 10.1% using two decision rules.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call