Incomplete-Data-Driven Speaker Segmentation for Diarization Application; A Help-Training Approach

Farshad Teimoori,Farbod Razzazi

doi:10.1007/s00034-018-0974-6

Abstract

This paper presents a new segmentation method for diarization application. This method is established on a support vector regression-based discriminative engine which bears the main duty of estimating the most possible change points. This engine is aided by a generative classifier in a help-training approach. Considering that there are no pre-labeled training samples in a segmentation task, the proposed model-based segmentation method attempts to suggest an appropriate solution to overcome this obstacle. The introduced iterative method supposes that the initial frames in a given segment belong to the associated speaker. This hypothesis permits the SVR engine to be initiated in the first iteration. In the following iterations, discriminative regression block in conjunction with the generative classifier tags the remaining frames with advantageous (positive) and disadvantageous (negative) labels. These newly labeled frames establish the working set to update the associated speaker model. In addition to the proposed segmentation method, a new strategy is introduced to estimate inserted and deleted change points. In the evaluation section, in addition to the common experimental assessment, attempts are made to achieve a unique and comprehensive insight into the statistical aspects of choosing training samples. Finally, comparison of the proposed segmentation and diarization system with similar method shows approximately 22.95% enhancement in the performance.

Full Text