Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech Processing

Benjamin Van Niekerk,Herman Kamper,Leanne Nortje,Matthew Baas

doi:10.21437/interspeech.2021-1182

Abstract

Contrastive predictive coding (CPC) aims to learn representations of speech by distinguishing future observations from a set of negative examples. Previous work has shown that linear classifiers trained on CPC features can accurately predict speaker and phone labels. However, it is unclear how the features actually capture speaker and phonetic information, and whether it is possible to normalize out the irrelevant details (depending on the downstream task). In this paper, we first show that the per-utterance mean of CPC features captures speaker information to a large extent. Concretely, we find that comparing means performs well on a speaker verification task. Next, probing experiments show that standardizing the features effectively removes speaker information. Based on this observation, we propose a speaker normalization step to improve acoustic unit discovery using K-means clustering of CPC features. Finally, we show that a language model trained on the resulting units achieves some of the best results in the ZeroSpeech2021~Challenge.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech Processing

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Generalization with Precision: The Role of Negative Teaching Examples in the Instruction of Generalized Grocery Item Selection
Robert H Horner ... Richard W Albin
Journal of the Association for Persons with Severe Handicaps | VOL. 11
Robert H Horner, et. al.Robert H Horner ... Richard W Albin
01 Dec 1986
Journal of the Association for Persons with Severe Handicaps | VOL. 11

Learn with SAT to Minimize Büchi Automata
Stephan Barth ... Martin Hofmann
Electronic Proceedings in Theoretical Computer Science | VOL. 96
Stephan Barth, et. al.Stephan Barth ... Martin Hofmann
07 Oct 2012
Electronic Proceedings in Theoretical Computer Science | VOL. 96

Genome-wide pre-miRNA discovery from few labeled examples.
C Yones ... D H Milone
Bioinformatics | VOL. 34
C Yones, et. al.C Yones ... D H Milone
25 Sep 2017
Bioinformatics | VOL. 34

A pairwise ranking based approach to learning with positive and unlabeled examples
Sundararajan Sellamanickam ... Sathiya Keerthi Selvaraj
-
Sundararajan Sellamanickam, et. al.Sundararajan Sellamanickam ... Sathiya Keerthi Selvaraj
24 Oct 2011
24 Oct 2011

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech Processing

Abstract

Talk to us

Similar Papers