Abstract

For music transcription or musical source separation, apart from knowing the multi-F0 contours, it is also important to know which F0 has been played by which instrument. This paper focuses on this aspect, i.e. given the polyphonic audio along with its multiple F0 contours, the proposed system clusters them so as to decide ‘which instrument played when.’ For the task of identifying the instrument or singers in the polyphonic audio, there are many supervised methods available. But many times individual source audio is not available for training. To address this problem, this paper proposes novel schemes using semi-supervised as well as unsupervised approach to source clustering. The proposed theoretical framework is based on auditory perception theory and is implemented using various tools like probabilistic latent component analysis and graph clustering, while taking into account various perceptual cues for characterizing a source. Experiments have been carried out over a wide variety of datasets - ranging from vocal to instrumental as well as from synthetic to real world music. The proposed scheme significantly outperforms a state of the art unsupervised scheme, which does not make use of the given F0 contours. The proposed semi-supervised approach also performs better than another semi-supervised scheme, which makes use of the given F0 information, in terms of computations as well as accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call