Abstract

One method for providing speaker independent word recognition capability is to construct a small set of templates for each vocabulary word that typifies and spans individual speaker word reference templates over a large population of speakers. Word recognition decision functions are based on combinations of template distance scores obtained by processing an unknown input utterance and comparing it with the ensemble of reference templates. In this study it is hypothesized that distributions of template distance scores are reasonably consistent for individual speakers and vary characteristically from speaker to speaker. This property is exploited to provide a speaker recognition capability in combination with word recognition. It is shown that good speaker recognition performance depends on the input of a sequence of distinct words. For a 20-speaker population, on the average, the correct speaker is in the top 1% of the candidates in the identification made over a sequence of seven distinct words.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call