Abstract
Recent success in the field of speech technology is undoubted. Developers from Microsoft and IBM reported on the efficiency of automated speech recognition systems at the human level in transcribing conversational telephone speech. According to various estimates, their WER now is about 5.8–5.1%. However, the most challenging problems in speech recognition – diarization and noise cancellation – are still open. A comparative analysis of the most frequent errors made by systems and people when solving the recognition problem shows that, in general, the errors are similar. Errors made by a human when solving speech recognition problems are much less critical; they seldom distort the meaning of a statement. In other words, these errors are not sematic. That is why the mechanisms of human speech perception are the most promising area of research. This paper proposes the model of a general structure for active auditory perception theory and the neurobiological basis of the hypothesis put forward. The proposed concept is a basic platform for general multiagent architecture. We assume that speech recognition is guided by attention, even in its early stages, a change in the early auditory code determined by context and experience. This model simulates the involuntary attention used by children in mastering their native language, based on an emotional assessment of perceptually significant auditory information. The multiagent internal dynamics of auditory speech coding can provide new insights into how hearing impairment can be treated. The formal description of the structure of speech perception can be used as a general theoretical basis for the development of universal systems for automatic speech recognition, highly effective in noisy conditions and cocktail-party situations. Formal means for program implementation of the present model are multiagent systems.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.