Speaker diarization through speaker embeddings

Mickaël Rouvier ,Pierre-Michel Bousquet ,Benoît Favre

doi:10.5281/zenodo.38841

Mickaël Rouvier , Pierre-Michel Bousquet + Show 1 more

https://doi.org/10.5281/zenodo.38841

Copy DOI

Abstract

This paper proposes to learn a set of high-level feature representations through deep learning, referred to as Speaker Embeddings, for speaker diarization. Speaker Embedding features are taken from the hidden layer neuron activations of Deep Neural Networks (DNN), when learned as classifiers to recognize a thousand speaker identities in a training set. Although learned through identification, speaker embeddings are shown to be effective for speaker verification in particular to recognize speakers unseen in the training set. In particular, this approach is applied to speaker diarization. Experiments, conducted on the corpus of French broadcast news ETAPE, show that this new speaker modeling technique decreases DER by 1.67 points (a relative improvement of about 8% DER).

Full Text