Abstract

Speaker Change Detection (SCD) is the task of segmenting an input audio-recording according to speaker interchanges. Nowadays, many applications, such as Speaker Diarization (SD) or automatic vocal transcription, depend on this segmentation task. In this paper, we focus on the essential task of the SD problem, the audio segmenting process, and suggest a solution for the SCD problem, as well as the assignment of clustered speaker labels for the extracted segments, and applying the solution over two datasets: a commercial dataset in Hebrew and the ICSI Meeting Corpus. As such, we propose a hybrid framework for the SCD problem that is learned by textual information and speech signals and the meta-data features that can be extracted from them. Moreover, we demonstrate the negative correlation between an increase in the number of speakers in the training dataset and the influence on the overall diarization system's performance, which is improved using our efficient SCD component. Finally, we show how our proposed hybrid framework remains robust compared to the ICSI Meeting Corpus, as the experimental evaluation's training and testing is based on two languages.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.