Abstract

This chapter describes the principles of speaker recognition and their application in smart environments. Speaker recognition is the process of recognizing the speaker using speech signals, which can be classified as speaker identification and verification. Speaker identification is the process of determining from which of the registered speakers a given utterance comes; speaker verification is the process of accepting or rejecting the identity claimed by a speaker. Speaker recognition can also be classified as text dependent, text independent, and text-prompted. Spectral envelope and prosody features of speech are normally used as speaker features. To accommodate intra-speaker variations in signal characteristics, it is important to apply parameter domain and likelihood domain normalization/adaptation techniques. High-level features, such as word idiolect, pronunciation, phone usage, and prosody, have recently been investigated in text-independent speaker verification. Speaker diarization, an application of speaker identification technology, is defined as the task of deciding “who spoke when,” in which speech versus nonspeech decisions are made and speaker changes are marked in the detected speech. Speaker diarization allows searching audio by speaker and makes speech recognition results easier to read. Increasingly, speaker segmentation and clustering are being used to aid the adaptation of speech recognizers and to supply metadata for audio indexing and searching. These techniques are becoming important in various applications using speaker-related information in smart environments.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call