Abstract

A speaker identification system (10) employs a supervised training process (100) that uses row action projection (RAP) to generate speaker model data for a set of speakers. The training process employing RAP uses less memory and processing resources by operating on a single row of a matrix at a time. Memory requirements are linearly proportional to number of speakers for storing each speakers information. A speaker is identified from the set of speakers by sampling the speaker's speech (202), deriving cepstral coefficients (208), and performing a polynomial expansion (212) on cepstral coefficients. The identified speaker (228) is selected using the product of the speaker model data (213) and the polynomial expanded coefficients from the speech sample.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call