Selecting frames for automatic speech recognition based on acoustic landmarks

Di He,Xuesong Yang,Mark Hasegawa-Johnson,Boon Pang P Lim,Deming Chen

doi:10.1121/1.4987204

Abstract

Most mainstream Mel-frequency cepstral coefficient (MFCC) based Automatic Speech Recognition (ASR) systems consider all feature frames equally important. However, the acoustic landmark theory disagrees with this idea. Acoustic landmark theory exploits the quantal non-linear articulatory-acoustic relationships from human speech perception experiments and provides a theoretical basis of extracting acoustic features in the vicinity of landmark regions where an abrupt change occurs in the spectrum of speech signals. In this work, we conducted experiments, using the TIMIT corpus, on both GMM and DNN based ASR systems and found that frames containing landmarks are more informative than others during the recognition process. We proved that altering the level of emphasis on landmark and non-landmark frames, through re-weighting or removing frame acoustic likelihoods accordingly, can change the phone error rate (PER) of the ASR system in a way dramatically different from making similar changes to random frames. Furthermore, by leveraging the landmark as a heuristic, one of our hybrid DNN frame dropping strategies achieved a PER increment of 0.44% when only scoring less than half, 41.2% to be precise, of the frames. This hybrid strategy out-performs other non-heuristic-based methods and demonstrated the potential of landmarks for computational reduction for ASR.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Selecting frames for automatic speech recognition based on acoustic landmarks

Abstract

Talk to us

Similar Papers

More From: The Journal of the Acoustical Society of America

Lead the way for us

Journal: The Journal of the Acoustical Society of America	Publication Date: May 1, 2017
Citations: 3

Similar Papers

Acoustic landmarks contain more information about the phone string than other frames for automatic speech recognition with deep neural network acoustic model.
Di He ... Mark Hasegawa-Johnson
The Journal of the Acoustical Society of America | VOL. 143
Di He, et. al.Di He ... Mark Hasegawa-Johnson
01 Jun 2018
The Journal of the Acoustical Society of America | VOL. 143

Theoretical Analysis of Diversity in an Ensemble of Automatic Speech Recognition Systems
Kartik Audhkhasi ... Shrikanth S Narayanan
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 22
Kartik Audhkhasi, et. al.Kartik Audhkhasi ... Shrikanth S Narayanan
01 Mar 2014
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 22

Interaction between people with dysarthria and speech recognition systems: A review
Aisha Jaddoh ... Omer Rana
Assistive Technology | VOL. 35
Aisha Jaddoh, et. al.Aisha Jaddoh ... Omer Rana
16 Apr 2022
Assistive Technology | VOL. 35

Automatic speech recognition (ASR) processing using confidence measures
Douglas J Brems
The Journal of the Acoustical Society of America | VOL. 102
Douglas J BremsDouglas J Brems
01 Jul 1997
The Journal of the Acoustical Society of America | VOL. 102

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Selecting frames for automatic speech recognition based on acoustic landmarks

Abstract

Talk to us

Similar Papers

More From: The Journal of the Acoustical Society of America