Music and speech signal processing using harmonic‐temporal clustering

Jonathan Le Roux,Nobutaka Ono,Hirokazu Kameoka,Shigeki Sagayama,Alain De Cheveigne

doi:10.1121/1.2935509

Abstract

We present here the principle of the recently introduced harmonic‐temporal clustering (HTC) framework and its applications in both music and speech signal processing. HTC relies on a precise parametric description of the harmonic parts of the power spectrum through constrained Gaussian mixture models. The model parameters of all the elements of the acoustical scene are estimated jointly by an unsupervised 2D time‐frequency clustering of the observed power density. HTC is effective for multi‐pitch analysis of music signals and F0 estimation of single and multiple speaker speech signals in various noisy environments. It also enables to perform extra processing of monaural music and speech signals, such as isolation or cancellation of a particular part, noise reduction and source separation.

Full Text