Abstract

Speaker diarization detects speaker change points in spoken data and organizes speaker clusters so that each cluster contains one speaker's segments. This study aims to develop online speaker diarization for multimedia data retrieval on mobile devices. Researchers have proposed various methods of diarization, but most approaches thus far depend on an empirically determined threshold as a criterion or work in an offline manner that requires prior knowledge, such as the overall number of speakers. There are therefore clear drawbacks with mobile devices, on which various types of spoken data are frequently played and replaced. A new approach to online speaker segmentation and clustering is proposed for overcoming these drawbacks. The proposed segmentation method considers the temporal locality of an analysis window, assuming that each window contains only a small number of speakers. In accordance with this property, a local universal background model (UBM) is constructed in a window and the model is used to detect speaker change points. A cluster boundary-based dynamic decision criterion is proposed for speaker clustering. This approach estimates the internal characteristics of clusters and uses them to determine cluster boundaries. In experiments using a broadcast news corpus, our techniques exhibited superior performance compared to conventional approaches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call