COSMO-Onset: A Neurally-Inspired Computational Model of Spoken Word Recognition, Combining Top-Down Prediction and Bottom-Up Detection of Syllabic Onsets.

Mamady Nabé,Jean-Luc Schwartz,Julien Diard

doi:10.3389/fnsys.2021.653975

Abstract

Recent neurocognitive models commonly consider speech perception as a hierarchy of processes, each corresponding to specific temporal scales of collective oscillatory processes in the cortex: 30–80 Hz gamma oscillations in charge of phonetic analysis, 4–9 Hz theta oscillations in charge of syllabic segmentation, 1–2 Hz delta oscillations processing prosodic/syntactic units and the 15–20 Hz beta channel possibly involved in top-down predictions. Several recent neuro-computational models thus feature theta oscillations, driven by the speech acoustic envelope, to achieve syllabic parsing before lexical access. However, it is unlikely that such syllabic parsing, performed in a purely bottom-up manner from envelope variations, would be totally efficient in all situations, especially in adverse sensory conditions. We present a new probabilistic model of spoken word recognition, called COSMO-Onset, in which syllabic parsing relies on fusion between top-down, lexical prediction of onset events and bottom-up onset detection from the acoustic envelope. We report preliminary simulations, analyzing how the model performs syllabic parsing and phone, syllable and word recognition. We show that, while purely bottom-up onset detection is sufficient for word recognition in nominal conditions, top-down prediction of syllabic onset events allows overcoming challenging adverse conditions, such as when the acoustic envelope is degraded, leading either to spurious or missing onset events in the sensory signal. This provides a proposal for a possible computational functional role of top-down, predictive processes during speech recognition, consistent with recent models of neuronal oscillatory processes.

Highlights

Speech processing is classically conceived as a hierarchical process which can be broken down into several processing steps, from the low-level extraction of phonetic and prosodic cues, to their higher-level integration into lexical units and syntactic phrases, and to global comprehension. This hierarchical organization may be related to a hierarchy of temporal scales, COSMO-Onset: Computational Model of Word Recognition from short-term phonetic analysis at a temporal scale of tens of milliseconds, to syllabic envelope modulations around 200 ms, and slower prosodic-syntactic phrases with durations of the order of magnitude of typically a second
These temporal scales are found in all languages of the world, and, in particular, the regularity of syllabic rhythms has been the focus of a large number of studies (Ramus et al, 1999; Pellegrino et al, 2011; Ding et al, 2017)
The complete mathematical definition of the model is provided in Supplementary Materials; here, instead, we describe the overall structure of the model, and its resulting simulation of spoken word recognition processes

Summary

Neural Oscillations and Multi-Scale Speech Analysis

Speech processing is classically conceived as a hierarchical process which can be broken down into several processing steps, from the low-level extraction of phonetic and prosodic cues, to their higher-level integration into lexical units and syntactic phrases, and to global comprehension This hierarchical organization may be related to a hierarchy of temporal scales, COSMO-Onset: Computational Model of Word Recognition from short-term phonetic analysis at a temporal scale of tens of milliseconds, to syllabic envelope modulations around 200 ms, and slower prosodic-syntactic phrases with durations of the order of magnitude of typically a second. Top-down information from various stages of the speech perception process would be fed back to lower processing stages, possibly exploiting the beta band (15–20 Hz) which is assumed to be a relevant channel for providing such descending predictions (Engel and Fries, 2010; Arnal, 2012; Arnal and Giraud, 2012; Sohoglu et al, 2012; Rimmele et al, 2018)

Neuro-Computational Models of Syllabic Segmentation

MODEL ARCHITECTURE

General Principles

Decoding Module

Temporal Control Module

Linguistic Material

Phonetic Material

Phone Duration and Loudness Profiles

Paradigms for Test Conditions

Simulation Configuration

Performance Measures

RESULTS

Illustrative Example in Nominal Condition

Noisy-Event Condition

Hypo-Articulation-Event Condition

DISCUSSION

DATA AVAILABILITY STATEMENT

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in systems neuroscience	Publication Date: Aug 4, 2021
Citations: 5	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

COSMO-Onset: A Neurally-Inspired Computational Model of Spoken Word Recognition, Combining Top-Down Prediction and Bottom-Up Detection of Syllabic Onsets.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in systems neuroscience

Lead the way for us

Similar Papers

Pre-lexical abstraction of speech in the auditory cortex
Frank Eisner ... Jonas Obleser
Trends in Cognitive Sciences | VOL. 13
Frank Eisner, et. al.Frank Eisner ... Jonas Obleser
11 Dec 2008
Trends in Cognitive Sciences | VOL. 13

JTRACE: A reimplementation and extension of the TRACE model of speech perception and spoken word recognition
James S Magnuson ... Ted J Strauss
Behavior Research Methods | VOL. 39
James S Magnuson, et. al.James S Magnuson ... Ted J Strauss
01 Feb 2007
Behavior Research Methods | VOL. 39

Where are emotions in words? Functional localization of valence effects in visual word recognition.
Marina Palazova
Frontiers in psychology | VOL. 5
Marina PalazovaMarina Palazova
30 Sep 2014
Frontiers in psychology | VOL. 5

Accessing spoken words: The importance of word onsets.
William Marslen-Wilson ... Pienie Zwitserlood
Journal of Experimental Psychology: Human Perception and Performance | VOL. 15
William Marslen-Wilson, et. al.William Marslen-Wilson ... Pienie Zwitserlood
01 Aug 1989
Journal of Experimental Psychology: Human Perception and Performance | VOL. 15

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

COSMO-Onset: A Neurally-Inspired Computational Model of Spoken Word Recognition, Combining Top-Down Prediction and Bottom-Up Detection of Syllabic Onsets.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in systems neuroscience