Singing speaker clustering based on subspace learning in the GMM mean supervector space

Mahnoosh Mehrabani,John H.L Hansen

doi:10.1016/j.specom.2012.11.001

Mahnoosh Mehrabani, John H.L Hansen

Open Access

https://doi.org/10.1016/j.specom.2012.11.001

Copy DOI

Journal: Speech Communication	Publication Date: Feb 11, 2013
Citations: 48	License type: cc-by-nc-nd

Affiliation: The University of Texas at Dallas

Abstract

In this study, we propose algorithms based on subspace learning in the GMM mean supervector space to improve performance of speaker clustering with speech from both reading and singing. As a speaking style, singing introduces changes in the time-frequency structure of a speaker’s voice. The purpose of this study is to introduce advancements for speech systems such as speech indexing and retrieval which improve robustness to intrinsic variations in speech production. Speaker clustering techniques such as k-means and hierarchical are explored for analysis of acoustic space differences of a corpus consisting of reading and singing of lyrics for each speaker. Furthermore, a distance based on fuzzy c-means membership degrees is proposed to more accurately measure clustering difficulty or speaker confusability. Two categories of subspace learning methods are studied: unsupervised based on LPP, and supervised based on PLDA. Our proposed clustering method based on PLDA is a two stage algorithm: where first, initial clusters are obtained using full dimension supervectors, and next, each cluster is refined in a PLDA subspace resulting in a more speaker dependent representation that is less sensitive to speaking style. It is shown that LPP improves average clustering accuracy by 5.1% absolute versus a hierarchical baseline for a mixture of reading and singing, and PLDA based clustering increases accuracy by 9.6% absolute versus a k-means baseline. The advancements offer novel techniques to improve model formulation for speech applications including speaker ID, audio search, and audio content analysis.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Singing speaker clustering based on subspace learning in the GMM mean supervector space

Abstract

Talk to us

Similar Papers

More From: Speech Communication

Lead the way for us

Similar Papers

Speaker clustering for a mixture of singing and reading
Mahnoosh Mehrabani ... John H L Hansen
-
Mahnoosh Mehrabani, et. al.Mahnoosh Mehrabani ... John H L Hansen
09 Sep 2012
09 Sep 2012

Partially Supervised Speaker Clustering
Hao Tang ... T S Huang
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 34
Hao Tang, et. al. Hao Tang ... T S Huang
01 May 2012
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 34

<title>Integrated approach to multimodal media content analysis</title>
Tong Zhang ... C.-C Jay Kuo
-
Tong Zhang, et. al.Tong Zhang ... C.-C Jay Kuo
23 Dec 1999
23 Dec 1999

Temporal video segmentation: detecting the end-of-act in circus performance videos
Lukman H Iwan ... James A Thom
Multimedia Tools and Applications | VOL. 76
Lukman H Iwan, et. al.Lukman H Iwan ... James A Thom
09 Dec 2015
Multimedia Tools and Applications | VOL. 76

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Singing speaker clustering based on subspace learning in the GMM mean supervector space

Abstract

Talk to us

Similar Papers

More From: Speech Communication