Quality Assessment for Speaker Diarization and Its Application in Speaker Characterization

C Vaquero,A Ortega,Eduardo Lleida,A Miguel

doi:10.1109/tasl.2012.2236317

Abstract

There are many applications related to speaker characterization, specially in telephone environments, where large datasets are available but not directly useful since there are two speakers involved in every recording. Even with very accurate speaker diarization systems, we can expect to find some recordings with low diarization accuracy. The use of these recordings may reduce the accuracy of any speaker characterization technology. Therefore, it is highly desirable to detect those recordings where the speakers are correctly segmented, in order to discard or process manually the remaining ones before feeding them into the application. In this work we propose a set of confidence measures to assess the quality of a hypothetical diarization output, in order to detect those recordings that are correctly segmented. We show that these confidence measures enable us to retrieve most of the desired recordings from a given dataset, discarding those recordings that degrade the overall accuracy of an application that make use of speaker characterization technologies.

Full Text