Exploring emergent syllables in end-to-end automatic speech recognizers through model explainability technique

Vincenzo Norman Vitale,Francesco Cutugno,Antonio Origlia,Gianpaolo Coro

doi:10.1007/s00521-024-09435-1

Abstract

Automatic speech recognition systems based on end-to-end models (E2E-ASRs) can achieve comparable performance to conventional ASR systems while reproducing all their essential parts automatically, from speech units to the language model. However, they hide the underlying perceptual processes modelled, if any, and they have lower adaptability to multiple application contexts, and, furthermore, they require powerful hardware and an extensive amount of training data. Model-explainability techniques can explore the internal dynamics of these ASR systems and possibly understand and explain the processes conducting to their decisions and outputs. Understanding these processes can help enhance ASR performance and reduce the required training data and hardware significantly. In this paper, we probe the internal dynamics of three E2E-ASRs pre-trained for English by building an acoustic-syllable boundary detector for Italian and Spanish based on the E2E-ASRs’ internal encoding layer outputs. We demonstrate that the shallower E2E-ASR layers spontaneously form a rhythmic component correlated with prominent syllables, central in human speech processing. This finding highlights a parallel between the analysed E2E-ASRs and human speech recognition. Our results contribute to the body of knowledge by providing a human-explainable insight into behaviours encoded in popular E2E-ASR systems.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Exploring emergent syllables in end-to-end automatic speech recognizers through model explainability technique

Abstract

Talk to us

Similar Papers

More From: Neural Computing and Applications

Lead the way for us

Journal: Neural Computing and Applications	Publication Date: Feb 13, 2024
License type: CC BY 4.0

Similar Papers

Early Decision Making in Continuous Speech
Odette Scharenborg ... Lou Boves
-
Odette Scharenborg, et. al.Odette Scharenborg ... Lou Boves
01 Jun 2007
01 Jun 2007

Reaching over the gap: A review of efforts to link human and automatic speech recognition research
Odette Scharenborg
Speech Communication | VOL. 49
Odette ScharenborgOdette Scharenborg
03 Feb 2007
Speech Communication | VOL. 49

Stressed speech processing: Human vs automatic in non-professional speakers scenario
Sumitra Shukla ... S R M Prasanna
-
Sumitra Shukla, et. al.Sumitra Shukla ... S R M Prasanna
01 Jan 2010
01 Jan 2010

Comparing human and automatic speech recognition in simple and complex acoustic scenes
Constantin Spille ... Bernd T Meyer
Computer Speech & Language | VOL. 52
Constantin Spille, et. al.Constantin Spille ... Bernd T Meyer
14 Apr 2018
Computer Speech & Language | VOL. 52

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exploring emergent syllables in end-to-end automatic speech recognizers through model explainability technique

Abstract

Talk to us

Similar Papers

More From: Neural Computing and Applications