Abstract

Our efforts towards developing an automatic speaker verification (ASV) system for child speakers are presented in this paper. For the majority of the languages, children's speech data for training the ASV system is either unavailable (zero-resource) or very limited (low-resource). Under low- and zero-resource conditions, developing an ASV system becomes a very challenging problem. To overcome this issue, we have studied the effectiveness of in-domain and out-of-domain data augmentation in this work. Speed and pitch modifications of children's speech are employed for synthetically creating data in the case of in-domain data augmentation. On the other hand, a limited amount of adults' speech is used when out-of-domain data augmentation is performed. Using adults' speech leads to severe acoustic mismatch due to dissimilarity in the attributes of speech data from adult and child speakers. To address this drawback, speech data from adult speakers are subjected to voice conversion (VC) to alter the acoustic attributes. A cycle-consistent generative adversarial network is used in this work for voice conversion. Voice conversion renders adults' speech perceptually similar to children's speech. The voice converted adults' data can then be used for augmentation, ensuring that the acoustic mismatch is minimal. To study the effectiveness of proposed data augmentation techniques experimentally, x-vector-based ASV system architecture is employed. At the same time, the role of i-vector is also studied in this paper. As a consequence of data augmentation, both equal error rate and minimum decision cost function are reduced significantly in low- and zero-resource conditions. At the same time, employing i-vectors for modeling speaker characteristics is noted to be superior. Finally, we have also presented a detailed study on the effect of data augmentation with child speakers' age variation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call