In-Domain and Out-of-Domain Data Augmentation to Improve Children’s Speaker Verification System in Limited Data Scenario

S Shahnawazuddin,Avinash Kumar,Nagaraj Adiga,Waquar Ahmad

doi:10.1109/icassp40776.2020.9053891

Abstract

In this paper, we present our efforts towards developing a robust automatic speaker verification (ASV) system for children when the domain-specific data is limited. For that purpose, we have studied the effect of in-domain and out-of-domain data augmentation. Several different combinations of data augmentation are studied in this work. Speed and pitch perturbation of children’s speech are employed for synthetically creating in-domain data to be used for augmentation. For out-of-domain data augmentation, on the other hand, adults’ speech is pooled together with children’s speech. At the same time, voice conversion (VC) is also applied on adults’ speech to alter the acoustic attributes. VC of adults’ speech makes it perceptually similar to that of children’s speech. The converted adults’ data is then used for augmentation. The ASV systems developed in this study employ x-vectors derived using a time-delay deep neural network. In addition to that, probabilistic linear discriminant analysis is used for scoring the performance. The explored methods of data augmentation are noted to reduce the equal error rate as well as minimum decision cost function by a large margin.

Full Text