Voice Conversion Using RNN Pre-Trained by Recurrent Temporal Restricted Boltzmann Machines

Toru Nakashika,Tetsuya Takiguchi,Yasuo Ariki

doi:10.1109/taslp.2014.2379589

Toru Nakashika, Tetsuya Takiguchi + Show 1 more

Open Access

https://doi.org/10.1109/taslp.2014.2379589

Copy DOI

Abstract

This paper presents a voice conversion (VC) method that utilizes the recently proposed probabilistic models called recurrent temporal restricted Boltzmann machines (RTRBMs). One RTRBM is used for each speaker, with the goal of capturing high-order temporal dependencies in an acoustic sequence. Our algorithm starts from the separate training of one RTRBM for a source speaker and another for a target speaker using speaker-dependent training data. Because each RTRBM attempts to discover abstractions to maximally express the training data at each time step, as well as the temporal dependencies in the training data, we expect that the models represent the linguistic-related latent features in high-order spaces. In our approach, we convert (match) features of emphasis for the source speaker to those of the target speaker using a neural network (NN), so that the entire network (consisting of the two RTRBMs and the NN) acts as a deep recurrent NN and can be fine-tuned. Using VC experiments, we confirm the high performance of our method, especially in terms of objective criteria, relative to conventional VC methods such as approaches based on Gaussian mixture models and on NNs.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE/ACM Transactions on Audio, Speech, and Language Processing	Publication Date: Mar 1, 2015
Citations: 75	License type: implied-oa

R Discovery Prime

R Discovery Prime

Voice Conversion Using RNN Pre-Trained by Recurrent Temporal Restricted Boltzmann Machines

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Lead the way for us

Similar Papers

High-order sequence modeling using speaker-dependent recurrent temporal restricted boltzmann machines for voice conversion
Toru Nakashika ... Tetsuya Takiguchi
-
Toru Nakashika, et. al.Toru Nakashika ... Tetsuya Takiguchi
14 Sep 2014
14 Sep 2014

Many-to-many voice conversion based on multiple non-negative matrix factorization
Ryo Aihara ... Testuya Takiguchi
-
Ryo Aihara, et. al.Ryo Aihara ... Testuya Takiguchi
06 Sep 2015
06 Sep 2015

Multimodal voice conversion based on non-negative matrix factorization
Kenta Masaka ... Tetsuya Takiguchi
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2015
Kenta Masaka, et. al.Kenta Masaka ... Tetsuya Takiguchi
04 Sep 2015
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2015

Multimodal exemplar-based voice conversion using lip features in noisy environments
Kenta Masaka ... Ryo Aihara
-
Kenta Masaka, et. al.Kenta Masaka ... Ryo Aihara
14 Sep 2014
14 Sep 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Voice Conversion Using RNN Pre-Trained by Recurrent Temporal Restricted Boltzmann Machines

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing