Abstract

A method and system for multi-frame prediction in a hybrid neural network/hidden Markov model automatic speech recognition (ASR) system is disclosed. An audio input signal may be transformed into a time sequence of feature vectors, each corresponding to respective temporal frame of a sequence of periodic temporal frames of the audio input signal. The time sequence of feature vectors may be concurrently input to a neural network, which may process them concurrently. In particular, the neural network may concurrently determine for the time sequence of feature vectors a set of emission probabilities for a plurality of hidden Markov models of the ASR system, where the set of emission probabilities are associated with the temporal frames. The set of emission probabilities may then be concurrently applied to the hidden Markov models for determining speech content of the audio input signal.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.