MareText Independent Speaker Identification based on K-mean Algorithm

Allam Mousa

doi:10.15676/ijeei.2011.3.1.8

Abstract

This paper proposes a text-independent speaker identification system based on Mel Frequency Cepstral Coefficients as a feature extraction and Vector Quantization technique that would minimize the data required for processing. The correlation between the identification success rate and the various parameters of the system including the feature extraction tools and the data minimization technique will be examined. Extracted features of a speaker are quantized by a number of centroids and the K-mean algorithm has been integrated into the proposed speaker identification system. Such centroids constitute the codebook of that speaker. MFCC are calculated in both training and testing phases. To calculate these MFCC speakers uttered different words, once in a training session and once in a testing one. The speakers were identified according to the minimum quantization distance which was calculated between the centroids of each speaker in the training phase and the MFCC of individual speakers in the testing phase. Analysis was carried out to identify parameter values that could be used to improve the performance of the system. The experimental results illustrate the efficiency of the proposed method under several conditions Speaker recognition aims at recognizing speakers from their voices as each person has his own speech characteristics and his way of speaking. Speaker recognition is basically divided into speaker identification and speaker verification. Speaker identification is the process of determining which registered speaker provides the speech input, while verification is the task of automatically determining if a person really is the person he or she claims to be. Speaker recognition has many particular applications as a speaker's voice can be used to verify their identity and control access to services such as banking by telephone, database access services, voice dialing telephone shopping, information services and voice mail. Another important application of speaker recognition technology is for forensic purposes (1). Speaker recognition can be classified as based on text-dependent or text-independent methods. In the text dependent method, the speaker has to say key words or sentences having the same text for both training and recognition trials. Whereas in the text independent case the system can identify the speaker regardless of what is being said (2), (3), (4). The goal of this study is a real time text-independent speaker identification system, which consists of comparing a speech signal from an unknown speaker to a database of known speakers. The system will operate in two modes: a training mode and a recognition mode. During the training mode users will record their voices and make a feature model it. The recognition mode will use the information that the user has provided in the training mode and attempt to isolate and identify the speaker. The Mel Frequency Cepstral Coefficients (MFCC) and the Vector Quantization (VQ) algorithms are used to implement this process. The simple K-means clustering algorithm is used in this study whereas the LBG is used in other similar work (4).

Full Text