Abstract

Privacy preservation methods for voice data are evolving day by day. A recent state-of-the-art voice privacy algorithm uses an x-vector and neural source-filter (NSF)- based anonymization approach that converts the original input voice into a pseudo speaker’s voice. The method uses an affinity propagation clustering (APC) algorithm to choose a pseudo speaker’s x-vector. Finding a set of distance measures for this clustering technique is important to get optimal anonymization. To that effect, in this paper, an attempt has been made to investigate the effect of six distance measures, namely, Euclidean, cosine, probabilistic linear discriminant analysis (PLDA), correlation, Manhattan, and Mahalanobis for voice privacy preservation using an x-vector-based anonymization system. This approach gave a 4.75% relative improvement in Equal Error Rate(EER) for original enrolls and anonymized trials. In addition, 11.49% relative improvement in EER is observed for anonymized enrolls and trials. Experimental results show that Mahalanobis and Pearson correlation coefficient-based distance are better choices for anonymization tasks. It provides better speaker de-identification and good speech intelligibility without increasing system complexity.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.