Abstract
Common belief in speech recognition community is that most significant improvements in performance on a machine come from more training data. Implicit is a tacit assumption that speech to be recognized comes from the same distribution as the speech on which the machine was trained. Problems occur when this assumption is violated. Words that are not in a lexicon of a machine, unexpected distortions of a signal and noises, unknown accents, and other speech peculiarities all create problems for the current ASR. The problem is inherent to machine learning and will not go away unless alternatives to extensive reliance on false beliefs of unchanging world are found. In an automatic recognition of speech, words that are not in the expected lexicon of the machine are typically substituted by some acoustically similar but nevertheless wrong words. Similarly, unexpected noise is typically ignored in human speech communication but causes significant problems to a machine. We discuss a biologically inspired multistream architecture of a speech recognition machine that could alleviate some of the problems with the unexpected acoustic inputs. Some published experimental results are given.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.