Highly efficient and effective short‐term spectral representations of talkers can be obtained using vector quantization (VQ) codebook construction techniques. A talker recognition system has been implemented in which each talker is represented by a VQ codebook constructed from a large set of short‐term spectral vectors obtained from a series of training utterances provided by the talker. In operation, the utterances of an unknown talker are analyzed and “encoded” using the codebook of a specified talker. The accumulated distortion between the input utterances and the specified talker's codebook is used to carry out a talker recognition decision. This technique can be said to be text independent to the extent that the training utterances adequately represent each talker's speech sound repertoire. The system can be extended to text‐dependent operation with an additional training procedure in which specified utterances provided by a given talker are represented as encoded vector sequences using the talker's codebook. In use, an unknown talker is prompted to provide specified utterances which are analyzed and compared with the encoded prototypes for a specified talker. The system has been evaluated using a 100‐talker database of 20 000 digits spoken in isolation. In a talker verification mode, average equal‐error rate performance of 2.2% for text‐independent operation and 0.3% for text‐dependent operation is obtained for seven‐digit‐long test utterances.
Read full abstract