Novel Approaches to Speaker Clustering for Speaker Diarization in Audio Broadcast News Data

Janez ibert,France Miheli

doi:10.5772/6386

Abstract

The growing demand to shift content-based information retrieval from text to various multimedia sources means there is an increasing need to deal with large amounts of multimedia information. The data provided from television and radio broadcast news (BN) programs are just one example of such a source of information. In our research we focus on the processing and analysis of audio BN data, where the main information source is represented by speech data. The main issues in our work relate to the preparation and organization of BN audio data for further processing in information audio-retrieval systems based on speech technologies. This chapter addresses the problem of structuring the audio data in terms of speakers, i.e., finding the regions in the audio streams that belong to a single speaker and then joining each region of the same speaker together. The task of organizing the audio data in this way is known as speaker diarization and was first introduced in the NIST project of Rich Transcription in the “Who spoke when” evaluations (Fiscus et al., 2004; Tranter & Reynolds, 2006). The speaker-diarization problem is composed of several stages, in which the three main tasks are performed: speech detection, speakerand background-change detection, and speaker clustering. While the aim of the speech detection and the speakerand acousticsegmentation procedures is to provide the proper segmentation of the audio data streams, the purpose of the speaker clustering is to join or connect together segments that belong to the same speakers, and this is usually applied in the last stage of the speaker-diarization process. In this chapter we focus on speaker-clustering methods, concentrating on developing proper representations of the speaker segments for clustering, and research different similarity measures for joining the speaker segments and explore different stopping criteria for the clustering that result in a minimization of the overall diarization error of such systems. The chapter is organized as follows: In Section 2, two baseline speaker-clustering approaches are presented. The first is a standard approach using a bottom-up agglomerative clustering principle with the Bayesian information criterion as the merging criterion. In the second system an alternative approach is applied, also using bottom-up clustering, but the representations of the speaker segments are modeled by Gaussian mixture models, and for O pe n A cc es s D at ab as e w w w .in te ch w eb .o rg

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Novel Approaches to Speaker Clustering for Speaker Diarization in Audio Broadcast News Data

Abstract

Talk to us

Similar Papers

Lead the way for us

Publication Date: Nov 1, 2008
Citations: 25	License type: cc-by-sa

Similar Papers

Robust audio segmentation

-

01 Jan 2004
01 Jan 2004

Fusion of Acoustic and Prosodic Features for Speaker Clustering
Janez Žibert ... France Mihelič
-
Janez Žibert, et. al.Janez Žibert ... France Mihelič
01 Jan 2009
01 Jan 2009

Robust Unsupervised Speaker Segmentation for Audio Diarization
Hachem Kadri ... Manuel Davy
-
Hachem Kadri, et. al.Hachem Kadri ... Manuel Davy
01 Mar 2010
01 Mar 2010

DNN-Based Speaker Clustering for Speaker Diarisation
Rosanna Milner ... Thomas Hain
-
Rosanna Milner, et. al.Rosanna Milner ... Thomas Hain
08 Sep 2016
08 Sep 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Novel Approaches to Speaker Clustering for Speaker Diarization in Audio Broadcast News Data

Abstract

Talk to us

Similar Papers