Speaker Diarization: A Review

Krishna Kumar

doi:10.55041/ijsrem24075

Abstract

Speaker diarization is the task of determining "who spoke when?" in an audio or video recording that contains an unknown amount of speech and an unknown number of speakers. It is a challenging task due to the variability of human speech, the presence of overlapping speech, and the lack of prior information about the speakers. It is the process of labeling a speech signal with labels corresponding to the identity of speakers. It is a crucial task in audio signal processing and speech analysis. A recent review of speaker diarization research since 2018 can be found in this paper which discusses the historical development of speaker diarization technology and recent advancements in neural speaker diarization approaches. Key Words: speaker diarization, speaker clustering, speaker embeddings

Full Text