Diarization Error Rate Research Articles

Deep speaker embedding extraction models have recently served as the cornerstone for modular speaker diarization systems. However, in current modular systems, the extracted speaker embeddings (namely, speaker features) do not effectively leverage their intrinsic relationships, and moreover, are not tailored specifically for the clustering task. In this paper, inspired by deep embedded clustering (DEC), we propose a speaker diarization method using the graph attention-based deep embedded clustering (GADEC) to address the aforementioned issues. First, considering the temporal nature of speech signals, when segmenting the speech signal into small segments, the speech in the current segment and its neighboring segments may likely belong to the same speaker. This suggests that embeddings extracted from neighboring segments could help generate a more informative speaker representation for the current segment. To better describe the complex relationships between segments and leverage the local structural information among their embeddings, we construct a graph for the pre-extracted speaker embeddings in a continuous audio signal. On this basis, we introduce a graph attentional encoder (GAE) module to integrate information from neighboring nodes (i.e., neighboring segments) in the graph and learn latent speaker embeddings. Moreover, we further jointly optimize both the latent speaker embeddings and the clustering results within a unified framework, leading to more discriminative speaker embeddings for the clustering task. Experimental results demonstrate that our proposed GADEC-based speaker diarization system significantly outperforms the baseline systems and several other recent speaker diarization systems concerning diarization error rate (DER) on the NIST SRE 2000 CALLHOME, AMI, and VoxConverse datasets.

Speaker diarization can be considered to be one of the complex problems in speaker recognition. A reliable diarization system should be able to accurately determine the variable length utterances which a speaker contributes to multi-speaker conversations. This is a difficult problem since text-independent speaker identification and verification is yet to be improved for it to be applied reliably. While efficient speaker modelling is important for diarization, the acoustical representation of speech is the basic entity that signifies a speaker. This representation should be outstanding enough to prevent a speaker’s utterances from being lost in the acoustical congestion that is imposed by the rest of the talkers.For this purpose, it is proposed here, for the case of multiple-microphone diarization, multiple speech signals are used in the acoustic feature extraction instead of combining the signals beforehand. The reason is to make an optimal use of those signals in order to enrich the quality of the acoustical representation of the speaker. To this end, and since not all microphone signals (channels) may be desirable, two selection approaches are proposed in this work. These are, a best quality channel selection method and a novel approach for diverse channel selection. Furthermore, a novel method is proposed which retains the speech spectrum from selected least reverberated subbands of the available channels’ spectrums. A new model, referred to here as Averaged Joint Gradient (AJG), is introduced for this purpose. The proposed approach reduces the Diarization Error Rate (DER) in both of the diarization systems used in the evaluations. The first system is based on binary keys and achieves a maximum relative reduction in DER of 14%. The second one is a Gaussian Mixture Model-Bayesian Information Criterion (GMM-BIC) based system which achieves a maximum relative reduction in DER of 20%.

Diarization Error Rate Research Articles

Related Topics

Articles published on Diarization Error Rate

Conversations in the wild: Data collection, automatic generation and evaluation

Real-time multilingual speech recognition and speaker diarization system based on Whisper segmentation.

Speaker diarization with variants of self-attention and joint speaker embedding extractor

Graph attention-based deep embedded clustering for speaker diarization

Blueprint Separable Subsampling and Aggregate Feature Conformer-Based End-to-End Neural Diarization

End-to-end neural speaker diarization with an iterative adaptive attractor estimation

ANSD-MA-MSE: Adaptive Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding

QDM-SSD: Quality-Aware Dynamic Masking for Separation-Based Speaker Diarization

Multi-Target Extractor and Detector for Unknown-Number Speaker Diarization

Towards developing speaker diarization for parent-child interactions

The Impact of Speaker Diarization on DNN-based Autism Severity Estimation.

A hybrid HXPLS‐TMFCC parameterization and DCNN‐SFO clustering based speaker diarization system

Channel and channel subband selection for speaker diarization

Active Correction for Incremental Speaker Diarization of a Collection with Human in the Loop

Speaker Naming in Arabic TV Programs

Singer Diarization for Polyphonic Music With Unison Singing

Speaker Diarisation of Vibroacoustic Intelligence from Drone Mounted Laser Doppler Vibrometers

Meta-learning with Latent Space Clustering in Generative Adversarial Network for Speaker Diarization.

Self-Supervised Representation Learning With Path Integral Clustering for Speaker Diarization

Supervised Speaker Diarization Using Random Forests: A Tool for Psychotherapy Process Research.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Diarization Error Rate Research Articles

Related Topics

Articles published on Diarization Error Rate

Conversations in the wild: Data collection, automatic generation and evaluation

Real-time multilingual speech recognition and speaker diarization system based on Whisper segmentation.

Speaker diarization with variants of self-attention and joint speaker embedding extractor

Graph attention-based deep embedded clustering for speaker diarization

Blueprint Separable Subsampling and Aggregate Feature Conformer-Based End-to-End Neural Diarization

End-to-end neural speaker diarization with an iterative adaptive attractor estimation

ANSD-MA-MSE: Adaptive Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding

QDM-SSD: Quality-Aware Dynamic Masking for Separation-Based Speaker Diarization

Multi-Target Extractor and Detector for Unknown-Number Speaker Diarization

Towards developing speaker diarization for parent-child interactions

The Impact of Speaker Diarization on DNN-based Autism Severity Estimation.

A hybrid HXPLS‐TMFCC parameterization and DCNN‐SFO clustering based speaker diarization system

Channel and channel subband selection for speaker diarization

Active Correction for Incremental Speaker Diarization of a Collection with Human in the Loop

Speaker Naming in Arabic TV Programs

Singer Diarization for Polyphonic Music With Unison Singing

Speaker Diarisation of Vibroacoustic Intelligence from Drone Mounted Laser Doppler Vibrometers

Meta-learning with Latent Space Clustering in Generative Adversarial Network for Speaker Diarization.

Self-Supervised Representation Learning With Path Integral Clustering for Speaker Diarization

Supervised Speaker Diarization Using Random Forests: A Tool for Psychotherapy Process Research.