Abstract

Common belief in speech recognition community is that most significant improvements in performance on a machine come from more training data. Implicit is a tacit assumption that speech to be recognized comes from the same distribution as the speech on which the machine was trained. Problems occur when this assumption is violated. Words that are not in a lexicon of a machine, unexpected distortions of a signal and noises, unknown accents, and other speech peculiarities all create problems for the current ASR. The problem is inherent to machine learning and will not go away unless alternatives to extensive reliance on false beliefs of unchanging world are found. In an automatic recognition of speech, words that are not in the expected lexicon of the machine are typically substituted by some acoustically similar but nevertheless wrong words. Similarly, unexpected noise is typically ignored in human speech communication but causes significant problems to a machine. We discuss a biologically inspired multistream architecture of a speech recognition machine that could alleviate some of the problems with the unexpected acoustic inputs. Some published experimental results are given.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.