Abstract

The channelized instantaneous frequency of a signal x(t) is CIFx(ω,T)=(∂/∂T)arg(Fh(ω,T)), where Fh is the short-time Fourier transform of x(t) using window function h. The local group delay of a signal is LGDx(ω,T)=−(∂/∂ω)arg(Fh(ω,T)). For each point Fh(ω0,T0) in the STFT, the f−t coordinates [CIF(ω0,T0),t−LGD(ω0,T0)] pinpoint the local mean of the Rihacek distribution of complex signal energy, and this reassignment of the STFT magnitude yields a spectrogram from which a lot of blurriness is removed. Two algorithms which compute such a reassigned spectrogram are exemplified and evaluated. One is based on the theory of Nelson [‘‘Cross-spectral methods for processing speech,’’ J. Acoust. Soc. Am. 110(5), 2575–2592 (2001)]; the second implements the equations of Auger and Flandrin [‘‘Improving the readability of time-frequency and time-scale representations by the reassignment method,’’ IEEE Trans. Signal Process 43(5), 1068–1089 (1995)]. The empirical performance of each technique is qualitatively evaluated; both methods dramatically improve upon the standard spectrogram, and also surpass the naive benchmark provided by first-difference approximation of the phase derivatives. High-resolution spectrograms are provided for both music and speech signals; e.g., the time-frequency nature of a single glottal pulsation can be observed. Applications to sound morphing are also demonstrated, e.g., morphing voices and musical instruments.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call