Abstract

This paper presents an artificial neural network (ANN) for speaker-independent isolated word speech recognition. The network consists of three subnets in concatenation. The static information within one frame of speech signal is processed in the probabilistic mapping subnet that converts an input vector of acoustic features into a probability vector whose components are estimated probabilities of the feature vector belonging to the phonetic classes that constitute the words in the vocabulary. The dynamics capturing subnet computes the first-order cross correlation between the components of the probability vectors to serve as the discriminative feature derived from the interframe temporal information of the speech signal. These dynamic features are passed for decision-making to the classification subnet, which is a multilayer perceptron (MLP). The architecture of these three subnets are described, and the associated adaptive learning algorithms are derived. The recognition results for a subset of the DARPA TIMIT speech database are reported. The correct recognition rate of the proposed ANN system is 95.5%, whereas that of the best of continuous hidden Markov model (HMM)-based systems is only 91.0%. >

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.