Abstract

There are many applications related to speaker characterization, specially in telephone environments, where large datasets are available but not directly useful since there are two speakers involved in every recording. Even with very accurate speaker diarization systems, we can expect to find some recordings with low diarization accuracy. The use of these recordings may reduce the accuracy of any speaker characterization technology. Therefore, it is highly desirable to detect those recordings where the speakers are correctly segmented, in order to discard or process manually the remaining ones before feeding them into the application. In this work we propose a set of confidence measures to assess the quality of a hypothetical diarization output, in order to detect those recordings that are correctly segmented. We show that these confidence measures enable us to retrieve most of the desired recordings from a given dataset, discarding those recordings that degrade the overall accuracy of an application that make use of speaker characterization technologies.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.