Many-to-many voice conversion based on multiple non-negative matrix factorization

Ryo Aihara,Yasuo Ariki,Testuya Takiguchi

doi:10.21437/interspeech.2015-579

Abstract

We present in this paper an exemplar-based Voice Conversion (VC) method using Non-negative Matrix Factorization (NMF), which is different from conventional statistical VC. NMF-based VC has advantages of noise robustness and naturalness of converted voice compared to Gaussian Mixture Model (GMM)based VC. However, because NMF-based VC is based on parallel training data of source and target speakers, we cannot convert the voice of arbitrary speakers in this framework. In this paper, we propose a many-to-many VC method that makes use of Multiple Non-negative Matrix Factorization (Multi-NMF). By using Multi-NMF, an arbitrary speaker’s voice is converted to another arbitrary speaker’s voice without the need for any input or output speaker training data. We assume that this method is flexible because we can adopt it to voice quality control or noise robust VC. Index Terms: voice conversion, speech synthesis, many-tomany, exemplar-based, NMF

Full Text