Abstract
We propose a developmental model inspired by the cortico-basal system (CX-BG) for vocal learning in babies and for solving the correspondence mismatch problem they face when they hear unfamiliar voices, with different tones and pitches. This model is based on the neural architecture INFERNO standing for Iterative Free-Energy Optimization of Recurrent Neural Networks. Free-energy minimization is used for rapidly exploring, selecting and learning the optimal choices of actions to perform (eg sound production) in order to reproduce and control as accurately as possible the spike trains representing desired perceptions (eg sound categories). We detail in this paper the CX-BG system responsible for linking causally the sound and motor primitives at the order of a few milliseconds. Two experiments performed with a small and a large audio database show the capabilities of exploration, generalization and robustness to noise of our neural architecture in retrieving audio primitives during vocal learning and during acoustic matching with unheared voices (different genders and tones).
Highlights
Infants learn language by matching perceptually the low-level auditory features they hear with the articulatory motions they perform for vocal production
We make the Primary Auditory Cortex (PAC), Superior Temporal Gyrus (STG) and Striatum layers learn in an unsupervised manner so that the three structures self-organize to sparse distributions using Hebb’s law for the PAC and the Striatum whereas the STG learns the temporal dependencies across time using the Spike Timing-Dependent Plasticity (STDP) learning mechanism; the direction of the information flow is PAC!STG!STR
The self-organized mode is done through a winner-takes-all, which means that the highest STR unit activity is the one selected
Summary
Infants learn language by matching perceptually the low-level auditory features they hear with the articulatory motions they perform for vocal production. In order to interpret correctly which sound has been pronounced and which articulatory motion is producing it, brain networks have to be organized flexibly early in infancy, for retrieving and categorizing memory sequences of orders of milliseconds [2, 3]. We propose a brain-inspired model for the early vocal learning and the emergence of sound categorization performed during infancy. Few computational models of language processing exist and fewer are brain-inspired [4,5,6,7,8]. In this introduction, we will first review computational models of early vocal learning. We will present our architecture and discuss the advantages and limitations in comparison to these models
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.