Animal Analogs of Speech Perception
The use of spoken language requires sophisticated perception and processing of speech sounds and sound patterns. It is still a matter of debate to what extent speech sound perception is based on special mechanisms unique to humans or on more general auditory processing mechanisms also present in other species. This article provides an overview of comparative research on the abilities of non-human animals (mammals and birds) to perceive and process speech sounds. It shows that phenomena like categorical perception of speech sounds, biases in vowel perception, context dependency of sound categorization, speaker normalization as well as speaker identification can all be found in several animal species. Also, like humans, animals can categorize speech sounds flexibly along different orthogonal dimensions (e.g., categorize speech sounds by word or by speaker sex) and are sensitive to prosodic patterns. While the perceptual sensitivities of humans and animals, as well as among different animal species, are not necessarily identical, they seem equivalent in their degree of sophistication. Although there is a need for more extensive and systematic comparative studies, human speech perception thus seems based on broadly shared sensitivities. These are likely to have arisen because all animals that use vocalizations to communicate need to be able to identify meaningful variations in sounds and sound patterns. Such sensitivities may have guided the evolution of speech sounds, rather than being a consequence of the evolution of spoken language.
- Research Article
10
- 10.1044/2022_jslhr-21-00276
- Jul 12, 2022
- Journal of Speech, Language, and Hearing Research
Evidence increasingly indicates that people with developmental stuttering have auditory perception deficits. Our previous research has indicated similar but slower performance in categorical perception of the speech sounds under the quiet condition in children who stutter and adults who stutter (AWS) compared with their typically fluent counterparts. We hypothesized that the quiet condition may not be sufficiently sensitive to reveal subtle perceptual deficiencies in people who stutter. This study examined this hypothesis by testing the categorical perception of speech and nonspeech sounds under backward masking condition (i.e., a noise was presented immediately after the target stimuli). Fifteen Cantonese-speaking AWS and 15 adults who do not stutter (AWNS) were tested on the categorical perception of four stimulus continua, namely, consonant varying in voice onset time (VOT), vowel, lexical tone, and nonspeech, under the backward masking condition using identification and discrimination tasks. AWS demonstrated a broader boundary width than AWNS in the identification task. AWS also exhibited a worse performance than AWNS in the discrimination of between-category stimuli but a comparable performance in the discrimination of within-category stimuli, indicating reduced sensitivity to sounds that belonged to different phonemic categories among AWS. Moreover, AWS showed similar patterns of impaired categorical perception across the four stimulus types, although the boundary location on the VOT continuum occurred at an earlier point in AWS than in AWNS. The findings provide robust evidence that AWS exhibit impaired categorical perception of speech and nonspeech sounds under the backward masking condition. Temporal processing (i.e., VOT manipulation), frequency/spectral/formant processing (i.e., lexical tone or vowel manipulations), and nonlinguistic pitch processing were all found to be impaired in AWS. Altogether, the findings support the hypothesis that AWS might be less efficient in accessing the phonemic representations when exposed to a demanding listening condition. https://doi.org/10.23641/asha.20249718.
- Research Article
46
- 10.1016/s0006-3223(98)00064-x
- Jan 1, 1999
- Biological Psychiatry
Impaired categorical perception of synthetic speech sounds in schizophrenia
- Research Article
5
- 10.1016/j.neuropsychologia.2022.108442
- Dec 5, 2022
- Neuropsychologia
Neuromodulation of the left auditory cortex with transcranial direct current stimulation (tDCS) has no effect on the categorical perception of speech sounds
- Research Article
279
- 10.1523/jneurosci.6018-08.2009
- Aug 5, 2009
- The Journal of neuroscience : the official journal of the Society for Neuroscience
Listening to speech modulates activity in human motor cortex. It is unclear, however, whether the motor cortex has an essential role in speech perception. Here, we aimed to determine whether the motor representations of articulators contribute to categorical perception of speech sounds. Categorization of continuously variable acoustic signals into discrete phonemes is a fundamental feature of speech communication. We used repetitive transcranial magnetic stimulation (rTMS) to temporarily disrupt the lip representation in the left primary motor cortex. This disruption impaired categorical perception of artificial acoustic continua ranging between two speech sounds that differed in place of articulation, in that the vocal tract is opened and closed rapidly either with the lips or the tip of the tongue (/ba/-/da/ and /pa/-/ta/). In contrast, it did not impair categorical perception of continua ranging between speech sounds that do not involve the lips in their articulation (/ka/-/ga/ and /da/-/ga/). Furthermore, an rTMS-induced disruption of the hand representation had no effect on categorical perception of either of the tested continua (/ba/-da/ and /ka/-/ga/). These findings indicate that motor circuits controlling production of speech sounds also contribute to their perception. Mapping acoustically highly variable speech sounds onto less variable motor representations may facilitate their phonemic categorization and be important for robust speech perception.
- Research Article
21
- 10.1037/xlm0001213
- Jul 1, 2023
- Journal of Experimental Psychology: Learning, Memory, and Cognition
Individuals differ in their ability to perceive and learn unfamiliar speech sounds, but we lack a comprehensive theoretical account that predicts individual differences in this skill. Predominant theories largely attribute difficulties of non-native speech perception to the relationships between non-native speech sounds/contrasts and native-language categories. The goal of the current study was to test whether the predictions made by these theories can be extended to predict individual differences in naive perception of non-native speech sounds or learning of these sounds. Specifically, we hypothesized that the internal structure of native-language speech categories is the cause of difficulty in perception of unfamiliar sounds such that learners who show more graded (i.e., less categorical) perception of sounds in their native language would have an advantage for perceiving non-native speech sounds because they would be less likely to assimilate unfamiliar speech tokens to their native-language categories. We tested this prediction in two experiments in which listeners categorized speech continua in their native language and performed tasks of discrimination or identification of difficult non-native speech sound contrasts. Overall, results did not support the hypothesis that individual differences in categorical perception of native-language speech sounds is responsible for variability in sensitivity to non-native speech sounds. However, participants who responded more consistently on a speech categorization task showed more accurate perception of non-native speech sounds. This suggests that individual differences in non-native speech perception are more related to the stability of phonetic processing abilities than to individual differences in phonetic category structure. (PsycInfo Database Record (c) 2023 APA, all rights reserved).
- Research Article
8
- 10.1080/02699206.2020.1803407
- Aug 12, 2020
- Clinical Linguistics & Phonetics
Stuttering is often attributed to the impaired speech production system, however, there is growing evidence implicating issues in speech perception. Our previous research showed that children who stutter have similar patterns but slower categorical perception (i.e. the ability to categorise different acoustic variations of the speech sounds into the same or different phonemic categories) compared to the children who do not stutter. This study aimed to extend our previous research to adults who stutter (AWS) using the same categorical perception paradigm. Fifteen AWS and 15 adults who do not stutter (A WNS) were recruited to complete identification and discrimination tasks involving acoustic variations of Cantonese speech sounds in four stimulus contexts: consonants (varying in voice onset times, VOTs), lexical tones, vowels and pure tones. The results showed similar categorical perception between the two groups in terms of the boundary position and width in the identification task and between-category benefits in the discrimination task. However, there were some trends for lower discrimination accuracy (overall d’ scores) and slower discrimination of the between-category stimuli versus within-category stimuli for AWS than AWNS. These results partially confirm our previous finding on children in terms of a comparable pattern of categorical perception between the two groups, but slower processing speed to access the phoneme representations in speech perception among AWS than AWNS.
- Research Article
- 10.3389/conf.neuro.09.2009.05.091
- Jan 1, 2009
- Frontiers in Human Neuroscience
Event Abstract Back to Event Differential cross-modal effect of speech sounds on the vMMN elicited by letter versus non-letter deviants Dries Froyen1, 2*, N. Van-Atteveldt1, 3 and L. Blomert1, 2 1 Faculty of psychology and congnitive neuroscience, Maastricht University, Netherlands 2 Maastricht Brain Imaging Center, M-BIC, Netherlands 3 College of Physicians and Surgeons, Columbia University, United States Appropriate development of letter - speech sound associations is considered crucial for reading acquisition. Brain imaging evidence indicated that heteromodal areas in superior temporal sulcus and modality-specific auditory cortex are involved in letter - speech sound processing. The role of early visual areas, however, remains unclear. In a previous auditory MMN (aMMN) study a cross-modal effect of letters on speech sound processing was observed. aMMN-amplitude to speech sounds presented simultaneously with letters was higher in comparison with aMMN amplitude to speech sounds presented in isolation, indicating early and automatic letter - speech sound integration. In the present study the visual counterpart of the aMMN, the vMMN, is used to investigate the influences of speech sounds on letter processing. Letter and non-letter deviants were infrequently presented in a train of standard letters either in isolation or simultaneously with a speech sound. As early as 150-220 ms after stimulus onset, influences of the speech sound on visual processing were observed. However, these influences were similar for letter and non-letter processing and thus not letter-specific. In a later time-window, between 240 and 350 msec, a differential effect of the speech sound on letter and non-letter processing was observed. Interestingly, the speech sound did not influence letter processing but suppressed non-letter processing. The present results support previous fMRI findings on letter - speech sound processing, in which the early visual areas were not involved when passively processing letters and speech sounds. This implies that, although letter - speech sound processing might rely on the evolutionarily older audiovisual speech processing system, both systems recruit early visual areas by means of a different mechanism. The present study furthermore provides an appropriate tool to investigate, with a high temporal resolution and non-invasively, cross-modal effects on visual processing. Conference: MMN 09 Fifth Conference on Mismatch Negativity (MMN) and its Clinical and Scientific Applications, Budapest, Hungary, 4 Apr - 7 Apr, 2009. Presentation Type: Poster Presentation Topic: Poster Presentations Citation: Froyen D, Van-Atteveldt N and Blomert L (2009). Differential cross-modal effect of speech sounds on the vMMN elicited by letter versus non-letter deviants. Conference Abstract: MMN 09 Fifth Conference on Mismatch Negativity (MMN) and its Clinical and Scientific Applications. doi: 10.3389/conf.neuro.09.2009.05.091 Copyright: The abstracts in this collection have not been subject to any Frontiers peer review or checks, and are not endorsed by Frontiers. They are made available through the Frontiers publishing platform as a service to conference organizers and presenters. The copyright in the individual abstracts is owned by the author of each abstract or his/her employer unless otherwise stated. Each abstract, as well as the collection of abstracts, are published under a Creative Commons CC-BY 4.0 (attribution) licence (https://creativecommons.org/licenses/by/4.0/) and may thus be reproduced, translated, adapted and be the subject of derivative works provided the authors and Frontiers are attributed. For Frontiers’ terms and conditions please see https://www.frontiersin.org/legal/terms-and-conditions. Received: 25 Mar 2009; Published Online: 25 Mar 2009. * Correspondence: Dries Froyen, Faculty of psychology and congnitive neuroscience, Maastricht University, 6200 MD Maastricht, Netherlands, d.froyen@psychology.unimaas.nl Login Required This action requires you to be registered with Frontiers and logged in. To register or login click here. Abstract Info Abstract The Authors in Frontiers Dries Froyen N. Van-Atteveldt L. Blomert Google Dries Froyen N. Van-Atteveldt L. Blomert Google Scholar Dries Froyen N. Van-Atteveldt L. Blomert PubMed Dries Froyen N. Van-Atteveldt L. Blomert Related Article in Frontiers Google Scholar PubMed Abstract Close Back to top Javascript is disabled. Please enable Javascript in your browser settings in order to see all the content on this page.
- Research Article
24
- 10.1016/j.anl.2010.05.007
- Jul 5, 2010
- Auris Nasus Larynx
Random gap detection test and random gap detection test-expanded: Results in children with previous language delay in early childhood
- Research Article
58
- 10.1016/s0920-9964(01)00382-6
- Nov 27, 2001
- Schizophrenia Research
Neuromagnetic correlates of impaired automatic categorical perception of speech sounds in schizophrenia
- Research Article
97
- 10.1121/1.1289701
- Oct 1, 2000
- The Journal of the Acoustical Society of America
All chapters conclude with Summary. Preface. I.ACOUSTIC PHONETICS: SPEECH ENCODING FROM ARTICULATION TO SOUND STREAM. 1.Language, Phonetics, and Speech Production. Introduction: Language and Science. How Significant Is Acoustic Phonetics? Linguistics, Phonetics, and Phonology. General Conditions of Speech Production. Speech Sound Sources. 2.Sounds, Resonance, and Spectrum Analysis. Sound Production and Propagation. Simple Harmonic Motion. Definitions of Sine Wave Characteristics. Resonance. Definition of Resonant Frequency. Spectrum Analysis. Definitions of Spectrum Terms. Spectra of Recurring Resonant Oscillations. Definitions of Harmonics. Resonant Waves, Spectrum Plots, and Speech Waves. Aperiodic Speech Sounds. 3.Vowel Shaping and Vowel Formants. Model of Pharyngeal-Oral Tract. Spectrum of the Neutral Vowel [e]. Definition of Speech Formants. Vowel Formant Locations and Length of Pharyngeal-Oral Tract. Vocal Tract Constrictions and Formant Frequency Locations. Formants of Model Vowels. Central Vowels. 4.The Glottal Sound Source and the Spectra of Vowels. The Glottal Sound Source. The Phonation Mechanism. The Spectrum of the Glottal Sound Source. Source-Filter Theory of Vowel Production. Visualizing Speech Sounds. Anatomy of Spectrogram. Spectrograms of Vowels. Nasalization of Vowels. Phonemic Nasal Vowels. 5.Prosodic and Tonal Features. Introduction: Telling What and How. Parenting Speech. Prosodic Features of Language Forms. Glottal Source Factors in Stress and Intonation. Durational Prosodic Features. Oral Tract Shaping Factor. Intonation in Discourse. Pacing, Rhythm, and Languages. Tone Languages. 6.Consonant Features, Glides, and Stops. Articulatory Features of Consonants. Distinctive Features. Glide Consonants and Diphthongs. Glide and Voiced Stop. Glide and Stop at Middle Pace. Lateral and Retroflex Glides. Effects of Utterance Position. 7.Consonants: Nasal, Stop, and Fricative Manners of Articulation. Nasal Consonants. Nasal-Glide-Stop Differences. Fricative Consonants. 8.Consonants: The Voiced-Unvoiced Contrast. Production of the Voiced-Voiceless Distinction. Acoustics of Consonant Voicing. Voiced versus Unvoiced Final Consonants. Voiced and Unvoiced Fricatives. Physiological Studies of Consonant Voicing. Physiological Studies of Fricative Voicing. 9.Consonants: Features of Place of Articulation. Formant Transitions of Alveolar versus Labial Consonants. Consonant Place: Transition with Different Vowels. Place Features of Nasal Consonants. Place Features of Fricative Consonants. 10.The Flow of Speech. Coarticulation. Model of Speech Motor Programming. The Syllable as Coarticulation Unit. Effects of Rate of Utterance. Assimilation between Adjacent Consonants. II.SPEECH DECODING BY HUMAN AND MACHINE: FROM SOUND STREAM TO WORDS. 11.Acoustic Cues to Speech Perception, Winifred Strange. Perception of Steady-State Vowels. Speaker Normalization in Vowel Perception. Perception of Coarticulated Vowels. 12.Perception of Vowels: Dynamic Constancy, Winifred Strange. Boundaries of Consonant Categories. Categorical Perception of Speech Continua. Perception of Nonspeech Analogs. Integration of Acoustic Cues. Context and Rate Effects on Phonetic Category Boundaries. 13.Auditory Capacities and Phonological Development: Animal, Baby, and Foreign Listeners, Sarah Hawkins. Background: Does Phonological Perception Use Special Auditory Processes? Techniques for Studying Babies' Speech Perception. Categorical Perception by Babies. Speech Sound Classification by Babies: Prototype and the Perceptual Magnet Effect. Speech Sound Classification by Babies: Constancy. Speech Sound Discrimination by Animals. The Effect of Experience on Speech Sound Discrimination. Development Loss or Selective Attention? The Effect of Retraining. 14.Looking for Invariant Correlates of Linguistic Units: Two Classical Theories of Speech Perception, Sarah Hawkins. Defining the Task of Speech Perception. Overview of Acoustic-Phonetic Theories of Speech Perception. Introduction to Two Classical Theories of Speech Perception. The Motor Theory of Speech Perception. Acoustical Invariants: The Quantal Theory of Speech, Relational Acoustic Invariance, and Lexical Access from Features (LAFF). 15.Reevaluating Assumptions about Speech Perception: Interactive and Integrative Theories, Sarah Hawkins. What Have We Learned from the Classical Theories? Invariance in the Percept but Not the Object: The Theory of Direct Realism. A General Auditory Model without Acoustic Variance: Auditory Enhancement Theory. Categories of Sound? Toward a More Comprehensive Theory of Speech Perception. 16.Hearing Loss and the Audibility of Phoneme Cues, Sally G. Revoile. Consonant Acoustic-Cue Use by a Hypothetical Profoundly Hard of Hearing Person. Phoneme Acoustic-Cue Use by a Hypothetical Severely Hard of Hearing Person. Phoneme Acoustic-Cue Use by a Hypothetical Moderately Hard of Hearing Person. 17.Speech Technology. Speech Machines, J. M. Pickett & Juergen Schroeter. Speech Synthesis, Corine Bickley, Ann Syrdal, & Juergen Schroeter. Speech Recognition by Machine, Diane Kewley-Port. Appendixes. A: Experimenting with Speech. B: Sketches of Some Interesting Books for Phoneticians. References. Index.
- Research Article
192
- 10.1016/j.neuroimage.2006.01.004
- Feb 28, 2006
- NeuroImage
Locating the initial stages of speech–sound processing in human temporal cortex
- Research Article
18
- 10.1016/j.cognition.2021.104687
- Mar 31, 2021
- Cognition
The impact of alphabetic literacy on the perception of speech sounds
- Book Chapter
3
- 10.1007/978-1-4419-5686-6_21
- Jan 1, 2010
The purpose of this paper is to draw attention to the definition of timbre as it pertains to the vowels of speech. There are two forms of size information in these “source-filter” sounds, information about the size of the excitation mechanism (the vocal folds), and information about the size of the resonators in the vocal tract that filter the excitation before it is projected into the air. The current definitions of pitch and timbre treat the two forms of size information differently. In this paper, we argue that the perception of speech sounds by humans suggests that the definition of timbre would be more useful if it grouped the size variables together and separated the pair of them from the remaining properties of these sounds.
- Book Chapter
- 10.1075/rmal.9.03tre
- Feb 17, 2025
The present chapter provides an overview of methodological considerations for research on French speech perception and spoken word recognition by second/foreign language (L2) learners. French has several phonetic and phonological characteristics that are relatively infrequent cross-linguistically and thus make it a very interesting L2 to investigate from the perspective of speech perception and spoken word recognition. The chapter provides a review of the methodological approaches used in studies that focus on these characteristics, considering both the perception and processing of speech sounds that are lexically contrastive and the perception and processing of speech sounds that signal word boundaries. The chapter also identifies areas for further investigation and open research questions, and it makes theoretical and methodological recommendations that take advantage of the unique properties of French speech for further advancing research on L2 learners’ speech perception and spoken word recognition.
- Research Article
13
- 10.3758/cabn.9.3.304
- Sep 1, 2009
- Cognitive, Affective, & Behavioral Neuroscience
Our native language has a lifelong effect on how we perceive speech sounds. Behaviorally, this is manifested as categorical perception, but the neural mechanisms underlying this phenomenon are still unknown. Here, we constructed a computational model of categorical perception, following principles consistent with infant speech learning. A self-organizing network was exposed to a statistical distribution of speech input presented as neural activity patterns of the auditory periphery, resembling the way sound arrives to the human brain. In the resulting neural map, categorical perception emerges from most single neurons of the model being maximally activated by prototypical speech sounds, while the largest variability in activity is produced at category boundaries. Consequently, regions in the vicinity of prototypes become perceptually compressed, and regions at category boundaries become expanded. Thus, the present study offers a unifying framework for explaining the neural basis of the warping of perceptual space associated with categorical perception.