Perception of Phonological Assimilation by Neural Speech Recognition Models

Charlotte Pouw,Marianne De Heer Kloots,Afra Alishahi,Willem Zuidema

doi:10.1162/coli_a_00526

Abstract

Abstract Human listeners effortlessly compensate for phonological changes during speech perception, often unconsciously inferring the intended sounds. For example, listeners infer the underlying /n/ when hearing an utterance such as “clea[m] pan”, where [m] arises from place assimilation to the following labial [p]. This article explores how the neural speech recognition model Wav2Vec2 perceives assimilated sounds, and identifies the linguistic knowledge that is implemented by the model to compensate for assimilation during Automatic Speech Recognition (ASR). Using psycholinguistic stimuli, we systematically analyze how various linguistic context cues influence compensation patterns in the model’s output. Complementing these behavioral experiments, our probing experiments indicate that the model shifts its interpretation of assimilated sounds from their acoustic form to their underlying form in its final layers. Finally, our causal intervention experiments suggest that the model relies on minimal phonological context cues to accomplish this shift. These findings represent a step towards better understanding the similarities and differences in phonological processing between neural ASR models and humans.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Perception of Phonological Assimilation by Neural Speech Recognition Models

Abstract

Published Version

Talk to us

Similar Papers

More From: Computational Linguistics

Lead the way for us

Journal: Computational Linguistics	Publication Date: Jul 22, 2024
License type: CC BY-NC-ND 4.0

Similar Papers

What is the Role of Audition in Literacy?
Donna Geffner
The ASHA Leader | VOL. 10
Donna GeffnerDonna Geffner
01 Sep 2005
The ASHA Leader | VOL. 10

Music and Cochlear Implants
Kate Gfeller
The ASHA Leader | VOL. 14
Kate GfellerKate Gfeller
01 Jun 2009
The ASHA Leader | VOL. 14

Age dependent deficits in speech recognition in quiet and noise are reflected in MGB activity and cochlear onset coding: Temporal delays in phoneme processing with aging
Konrad Dapper ... Marlies Knipper
NeuroImage | VOL. -
Konrad Dapper, et. al.Konrad Dapper ... Marlies Knipper
01 Nov 2024
NeuroImage | VOL. -

Relation between Phonological Processing, Auditory Processing and Speech Perception among Bilingual Poor Readers.
Mohan Kumar Kalaiah
Journal of audiology & otology | VOL. 19
Mohan Kumar KalaiahMohan Kumar Kalaiah
18 Dec 2015
Journal of audiology & otology | VOL. 19

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Perception of Phonological Assimilation by Neural Speech Recognition Models

Abstract

Published Version

Talk to us

Similar Papers

More From: Computational Linguistics