Short-time phase spectrum in speech processing: A review and some experimental results

Leigh D Alsteris,Kuldip K Paliwal

doi:10.1016/j.dsp.2006.06.007

Abstract

Incorporating information from the short-time phase spectrum into a feature set for automatic speech recognition (ASR) may possibly serve to improve recognition accuracy. Currently, however, it is common practice to discard this information in favour of features that are derived purely from the short-time magnitude spectrum. There are two reasons for this: (1) the results of some well-known human listening experiments have indicated that the short-time phase spectrum conveys a negligible amount of intelligibility at the small window durations of 20–40 ms used for ASR spectral analysis, and (2) using the short-time phase spectrum directly for ASR has proven difficult from a signal processing viewpoint, due to phase-wrapping and other problems. In this article, we explore the possibility of using short-time phase spectrum information for ASR by considering the two points mentioned above. To address the first point, we review the results of our own set of human listening experiments. Contrary to previous studies, our results indicate that the short-time phase spectrum can indeed contribute significantly to speech intelligibility over small window durations of 20–40 ms. Also, the results of these listening experiments, in addition to some ASR experiments, indicate that at least part of this intelligibility may be supplementary to that provided by the short-time magnitude spectrum. To address the second point (i.e., the signal processing difficulties), we suggest that it may be necessary to transform the short-time phase spectrum into a more physically meaningful representation from which useful features could possibly be extracted. Specifically, we investigate the frequency-derivative (or group delay function, GDF) and the time-derivative (or instantaneous frequency distribution, IFD) as potential candidates for this intermediate representation. We review our recent work, where we have performed various experiments which show that the GDF and IFD may be useful for ASR. In our recent work, we have also conducted several ASR experiments to test a feature set derived from the GDF. We found that, in most cases, these features perform worse than the standard MFCC features. Therefore, we suggest that a short-time phase spectrum feature set may ultimately be derived from a concatenation of information from both the GDF and IFD representations. For best performance, the feature set may also need to be concatenated with short-time magnitude spectrum information. Further to addressing the two aforementioned points, we also discuss a number of other speech applications in which the short-time phase spectrum has proven to be very useful. We believe that an appreciation for how the short-time phase spectrum has been used for other tasks, in addition to the results of our own experiments, will provoke fellow researchers to also investigate its potential for use in ASR.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Short-time phase spectrum in speech processing: A review and some experimental results

Abstract

Talk to us

Similar Papers

More From: Digital Signal Processing

Lead the way for us

Journal: Digital Signal Processing	Publication Date: Aug 4, 2006
Citations: 171

Similar Papers

Further intelligibility results from human listening tests using the short-time phase spectrum
Leigh D Alsteris ... Kuldip K Paliwal
Speech Communication | VOL. 48
Leigh D Alsteris, et. al.Leigh D Alsteris ... Kuldip K Paliwal
05 Dec 2005
Speech Communication | VOL. 48

Significance of the Modified Group Delay Feature in Speech Recognition
Rajesh M Hegde ... Venkata Ramana Rao Gadde
IEEE Transactions on Audio, Speech and Language Processing | VOL. 15
Rajesh M Hegde, et. al.Rajesh M Hegde ... Venkata Ramana Rao Gadde
01 Jan 2007
IEEE Transactions on Audio, Speech and Language Processing | VOL. 15

Writing with automatic speech recognition: Examining user’s behaviours and text quality (lexical diversity)
Walcir Cardoso ... Danial Mehdipour-Kolour
-
Walcir Cardoso, et. al.Walcir Cardoso ... Danial Mehdipour-Kolour
15 Aug 2023
15 Aug 2023

Use of speech presence uncertainty with MMSE spectral energy estimation for robust automatic speech recognition
Anthony Stark ... Kuldip Paliwal
Speech Communication | VOL. 53
Anthony Stark, et. al.Anthony Stark ... Kuldip Paliwal
19 Aug 2010
Speech Communication | VOL. 53

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Short-time phase spectrum in speech processing: A review and some experimental results

Abstract

Talk to us

Similar Papers

More From: Digital Signal Processing