Abstract

Production and comprehension of speech are closely interwoven. For example, the ability to detect an error in one's own speech, halt speech production, and finally correct the error can be explained by assuming an inner speech loop which continuously compares the word representations induced by production to those induced by perception at various cognitive levels (e.g., conceptual, word, or phonological levels). Because spontaneous speech errors are relatively rare, a picture naming and halt paradigm can be used to evoke them. In this paradigm, picture presentation (target word initiation) is followed by an auditory stop signal (distractor word) for halting speech production. The current study seeks to understand the neural mechanisms governing self-detection of speech errors by developing a biologically inspired neural model of the inner speech loop. The neural model is based on the Neural Engineering Framework (NEF) and consists of a network of about 500,000 spiking neurons. In the first experiment we induce simulated speech errors semantically and phonologically. In the second experiment, we simulate a picture naming and halt task. Target-distractor word pairs were balanced with respect to variation of phonological and semantic similarity. The results of the first experiment show that speech errors are successfully detected by a monitoring component in the inner speech loop. The results of the second experiment show that the model correctly reproduces human behavioral data on the picture naming and halt task. In particular, the halting rate in the production of target words was lower for phonologically similar words than for semantically similar or fully dissimilar distractor words. We thus conclude that the neural architecture proposed here to model the inner speech loop reflects important interactions in production and perception at phonological and semantic levels.

Highlights

  • The main goal of this study is to develop a neural architecture for speech production and perception, which, on the one hand, enables fast, effortless and error-free realization of word production, and, on the other hand, allows for the simulation of speech errors and realistic and effective speech monitoring

  • Because temporal aspects can be modeled in the Neural Engineering Framework and because this model— as with all spiking neuron models—generates variations from trial to trial at the level of neural states and their processing, it was possible to test the quality of this model by (i) checking the “natural” occurrence of speech errors, by (ii) checking whether the model is capable of generating speech errors if we evoke ambivalent neural states at different cognitive levels within speech production by including “side branches,” and by (iii) comparing the simulation results of a picture naming and halt task with human data

  • In this paper we have proposed a comprehensive spiking neuron model of the inner speech loop

Read more

Summary

Introduction

Speech production is a hierarchical process starting with the activation of an idea, which is intended to be communicated, proceeds with the activation of words, with modification and sequencing of words with respect to grammatical and syntactic rules, and ends with the activation of a sequence of motor actions that realize the intended utterance (Dell and Reich, 1981; Dell et al, 1997; Levelt et al, 1999; Levelt and Indefrey, 2004; Riecker et al, 2005). Restricting our attention to single word production (such as in a picture naming task), speech production starts with the activation of semantic concepts (e.g., “has wheels,” “can move,” “can transport persons,”), retrieves an associated word (e.g., “car”) and its phonological form (/kar/) from the mental lexicon (see e.g., Dell and Reich, 1981; Levelt et al, 1999), activates the relevant motor plan, which can be thought of as a collection of intended speech movements (such as: form tongue, lower jaw and lips for /k/, for /ar/; in parallel open glottis for production of the unvoiced speech sound /k/ and for the voiced sound /ar/) and executes these speech movements or actions in order to articulate the intended word and to generate the appropriate acoustic signal (see e.g., Kröger and Cao, 2015) This production process mainly consists of two stages, one cognitive and one sensorimotor. The cognitive stage consists of concept activation, word selection and the subsequent activation of the related phonological representation (Dell et al, 1997; Levelt et al, 1999), while the sensorimotor stage consists of motor plan activation ( called motor planning) and execution (Riecker et al, 2005; Kröger and Cao, 2015)

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call