Abstract

Our knowledge about the computational mechanisms underlying human learning and recognition of sound sequences, especially speech, is still very limited. One difficulty in deciphering the exact means by which humans recognize speech is that there are scarce experimental findings at a neuronal, microscopic level. Here, we show that our neuronal-computational understanding of speech learning and recognition may be vastly improved by looking at an animal model, i.e., the songbird, which faces the same challenge as humans: to learn and decode complex auditory input, in an online fashion. Motivated by striking similarities between the human and songbird neural recognition systems at the macroscopic level, we assumed that the human brain uses the same computational principles at a microscopic level and translated a birdsong model into a novel human sound learning and recognition model with an emphasis on speech. We show that the resulting Bayesian model with a hierarchy of nonlinear dynamical systems can learn speech samples such as words rapidly and recognize them robustly, even in adverse conditions. In addition, we show that recognition can be performed even when words are spoken by different speakers and with different accents—an everyday situation in which current state-of-the-art speech recognition models often fail. The model can also be used to qualitatively explain behavioral data on human speech learning and derive predictions for future experiments.

Highlights

  • Can we learn something about how humans recognize speech from how birds recognize song? The last common ancestor of humans and birds lived about 300 million years ago, human and songbird communication share several striking features at the cognitive, neuronal and molecular level [1,2]

  • A single learning step consists of learning one word by one module. Both recognition and learning in a module starts with sensation; a speech sound wave, and after passing through the cochlea model this serves as a dynamic input to the module

  • The speech signal is preprocessed by the cochlear model and the dynamic output of the cochlear model, which we denote by a vector z(t), reaches the first level of the module (Figure 1; for mathematical details see Model)

Read more

Summary

Introduction

Can we learn something about how humans recognize speech from how birds recognize song? The last common ancestor of humans and birds lived about 300 million years ago, human and songbird communication share several striking features at the cognitive, neuronal and molecular level [1,2]. Our brains map fast speech sound wave modulations to spectrotemporal auditory representations [3,4]. Songbirds map song sound wave modulations to specific internal representations [5,6]. The similarities include anatomical and functional features that characterize the pathways for vocal production, auditory processing and learning [1,2,7]. Much more is known experimentally about the exact neuronal mechanisms in songbirds than in humans, due to detailed electrophysiological studies which have shown that songbirds use a sequence of auditory dynamics to generate and recognize song in a highly effective manner [6,13]

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call