Abstract

We present here the principle of the recently introduced harmonic‐temporal clustering (HTC) framework and its applications in both music and speech signal processing. HTC relies on a precise parametric description of the harmonic parts of the power spectrum through constrained Gaussian mixture models. The model parameters of all the elements of the acoustical scene are estimated jointly by an unsupervised 2D time‐frequency clustering of the observed power density. HTC is effective for multi‐pitch analysis of music signals and F0 estimation of single and multiple speaker speech signals in various noisy environments. It also enables to perform extra processing of monaural music and speech signals, such as isolation or cancellation of a particular part, noise reduction and source separation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call