Abstract

This paper gives an overview of a system for phoneme-based large-vocabulary continuous-speech recognition. The system provides the speaker dependent recognition component in the speech understanding system spicos that is designed to recognize and understand database queries spoken in natural German language. The recognition technique used in the spicos project is based on an integrated approach that combines the various knowledge sources, such as inventory of subword units, pronunciation lexicon and language model, during the process of decision making in order to improve the reliability of the acoustic recognition. The recognition problem then amounts to an efficient search through a huge state space such that purely local decisions can be avoided and globally optimal decisions can be taken. The size of this state space depends primarily on the type of language model being used. Three types of language models are studied: no language constraints, finite state network, stochastic trigram model based on word categories. For each of the three language models, recognition experiments have been carried out on a 917-word task and 4 speakers. For each speaker, 200 sentences totalling 1391 words had to be recognized.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call