Multiple F0 Estimation and Source Clustering of Polyphonic Music Audio Using PLCA and HMRFs

Vipul Arora,Laxmidhar Behera

doi:10.1109/taslp.2014.2387388

Abstract

Source transcription of pitched polyphonic music entails providing the pitch (F0) values corresponding to each source in a separate channel. This problem is an important step towards many important problems in music and speech processing. It involves 1) estimating the multiple F0 values in each short time frame, and 2) clustering the F0 values into streams corresponding to different sources. We address the problem in an unsupervised way, with only the total number of sources given beforehand. The framework of probabilistic latent component analysis (PLCA) is used to decompose the polyphonic short-time magnitude spectra for multiple F0 estimation and source-specific feature extraction. It is further embedded into the structure of hidden Markov random fields (HMRF) for clustering the F0s into different sources. This clustering is constrained by the cognitive grouping of continuous F0 contours as well as segregation of simultaneous F0s into different source streams. Such constraints are effectively and elegantly modeled by the HMRF's. Simulated annealing varies the degree of constraints for better clustering. The paper also proposes a novel strategy using the trade-off between precision and recall of multiple F0 estimation for better clustering. Evaluations over a variety of datasets show the efficacy of the proposed algorithm and its robustness to the presence of spurious F0s while clustering. It also outperforms a state-of-the-art unsupervised source streaming algorithm in a set of comparative experiments.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multiple F0 Estimation and Source Clustering of Polyphonic Music Audio Using PLCA and HMRFs

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Lead the way for us

Journal: IEEE/ACM Transactions on Audio, Speech, and Language Processing	Publication Date: Feb 1, 2015
Citations: 24

Similar Papers

Noise-robust dynamic time warping using PLCA features
Brian King ... Gautham J Mysore
-
Brian King, et. al.Brian King ... Gautham J Mysore
01 Mar 2012
01 Mar 2012

A NEW SPECTRAL-SPATIAL SUBSPACE CLUSTERING ALGORITHM FOR HYPERSPECTRAL IMAGE ANALYSIS
K Rafiezadeh Shahi ... M Khodadadzadeh
ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences | VOL. V-3-2020
K Rafiezadeh Shahi, et. al.K Rafiezadeh Shahi ... M Khodadadzadeh
03 Aug 2020
ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences | VOL. V-3-2020

Effects of musical experience on learning lexical tone categories
Tian Zhao ... Patricia Kuhl
The Journal of the Acoustical Society of America | VOL. 131
Tian Zhao, et. al.Tian Zhao ... Patricia Kuhl
01 Apr 2012
The Journal of the Acoustical Society of America | VOL. 131

One Step BeyondMusical Expertise and Word Learning
Mireille Besson ... Eva Dittinger
-
Mireille Besson, et. al.Mireille Besson ... Eva Dittinger
06 Dec 2018
06 Dec 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multiple F0 Estimation and Source Clustering of Polyphonic Music Audio Using PLCA and HMRFs

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing