Abstract

SummaryHumans can recognize spoken words with unmatched speed and accuracy. Hearing the initial portion of a word such as “formu…” is sufficient for the brain to identify “formula” from the thousands of other words that partially match [1–6]. Two alternative computational accounts propose that partially matching words (1) inhibit each other until a single word is selected (“formula” inhibits “formal” by lexical competition [7–9]) or (2) are used to predict upcoming speech sounds more accurately (segment prediction error is minimal after sequences like “formu…” [10–12]). To distinguish these theories we taught participants novel words (e.g., “formubo”) that sound like existing words (“formula”) on two successive days [13–16]. Computational simulations show that knowing “formubo” increases lexical competition when hearing “formu…”, but reduces segment prediction error. Conversely, when the sounds in “formula” and “formubo” diverge, the reverse is observed. The time course of magnetoencephalographic brain responses in the superior temporal gyrus (STG) is uniquely consistent with a segment prediction account. We propose a predictive coding model of spoken word recognition in which STG neurons represent the difference between predicted and heard speech sounds. This prediction error signal explains the efficiency of human word recognition and simulates neural responses in auditory regions.

Highlights

  • Computational Simulations of Spoken Word Recognition All current accounts of spoken word recognition propose that identification occurs once speech segments that uniquely identify a single item are heard [7,8,9]

  • The dominant proposal in current computational models of spoken word recognition is that multiple lexical candidates are activated in parallel and compete through inhibitory connections [7] or other, functionally equivalent lexical mechanisms [8, 9]

  • The bottom panel shows experimental predictions for neural correlates of (E) lexical entropy and (F) segment prediction error measures for six critical conditions in our experiment, averaged over speech segments before and after the Deviation Point (DP) for the item set ‘‘formula,’’ ‘‘formubo,’’ and ‘‘formuty.’’ These profiles are typical of the pattern observed for all 216 triples in our item set

Read more

Summary

Summary

Humans can recognize spoken words with unmatched speed and accuracy. Hearing the initial portion of a word such as ‘‘formu.’’ is sufficient for the brain to identify ‘‘formula’’ from the thousands of other words that partially match [1,2,3,4,5,6]. Two alternative computational accounts propose that partially matching words (1) inhibit each other until a single word is selected (‘‘formula’’ inhibits ‘‘formal’’ by lexical competition [7,8,9]) or (2) are used to predict upcoming speech sounds more accurately (segment prediction error is minimal after sequences like ‘‘formu.’’ [10,11,12]). We propose a predictive coding model of spoken word recognition in which STG neurons represent the difference between predicted and heard speech sounds This prediction error signal explains the efficiency of human word recognition and simulates neural responses in auditory regions

Results
A Lexical competition Uniqueness
B Segment Prediction Uniqueness f Speech input j
Á pðwordi À
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.