Abstract
A method and apparatus are disclosed for transcribing speech when a number of speakers are participating. A number of different speech recognition systems, each with a different speaker model, are executed in parallel. When the identity of all of the participating speakers is known and a speaker model is available for each participant, each speech recognition system employs a different speaker model suitable for a corresponding participant. Each speech recognition system decodes the speech and generates a corresponding confidence score. The decoded output having the highest confidence score is selected for presentation to a user. When all participating speakers are not known, or when there are too many participants to implement a unique speaker model for each participant, a speaker independent speech recognition system is employed together with a speaker specific speech recognition system. A controller selects between the decoded outputs of the speaker independent speech recognition system and the speaker specific speech recognition system based on information received from a speaker identification system and a speaker change detector.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have