Learning Structured Sparse Representations for Voice Conversion

Shaojin Ding,Ricardo Gutierrez-Osuna,Guanlong Zhao,Christopher Liberatore

doi:10.1109/taslp.2019.2955289

Shaojin Ding, Ricardo Gutierrez-Osuna + Show 2 more

Open Access

https://doi.org/10.1109/taslp.2019.2955289

Copy DOI

Abstract

Sparse-coding techniques for voice conversion assume that an utterance can be decomposed into a sparse code that only carries linguistic contents, and a dictionary of atoms that captures the speakers’ characteristics. However, conventional dictionary-construction and sparse-coding algorithms rarely meet this assumption. The result is that the sparse code is no longer speaker-independent, which leads to lower voice-conversion performance. In this paper, we propose a Cluster-Structured Sparse Representation (CSSR) that improves the speaker independence of the representations. CSSR consists of two complementary components: a Cluster-Structured Dictionary Learning module that groups atoms in the dictionary into clusters, and a Cluster-Selective Objective Function that encourages each speech frame to be represented by atoms from a small number of clusters. We conducted four experiments on the CMU ARCTIC corpus to evaluate the proposed method. In a first ablation study, results show that each of the two CSSR components enhances speaker independence, and that combining both components leads to further improvements. In a second experiment, we find that CSSR uses increasingly larger dictionaries more efficiently than phoneme-based representations by allowing finer-grained decompositions of speech sounds. In a third experiment, results from objective and subjective measurements show that CSSR outperforms prior voice-conversion methods, improving the acoustic quality of the synthesized speech while retaining the target speaker's voice identity. Finally, we show that the CSSR captures latent (i.e., phonetic) information in the speech signal.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE/ACM transactions on audio, speech, and language processing	Publication Date: Dec 5, 2019
Citations: 68	License type: publisher-specific, author manuscript

R Discovery Prime

R Discovery Prime

Learning Structured Sparse Representations for Voice Conversion

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM transactions on audio, speech, and language processing

Lead the way for us

Similar Papers

Sparse Incomplete Representations: A Potential Role of Olfactory Granule Cells
Alexei A Koulakov ... Dmitry Rinberg
Neuron | VOL. 72
Alexei A Koulakov, et. al.Alexei A Koulakov ... Dmitry Rinberg
01 Oct 2011
Neuron | VOL. 72

S184. Sparse representation and classification of neural spikes using supervised dictionary learning
Ahmed Dallal ... Zhi-Hong Mao
Electroencephalography and Clinical Neurophysiology/Evoked Potentials Section | VOL. 129
Ahmed Dallal, et. al.Ahmed Dallal ... Zhi-Hong Mao
01 May 2018
Electroencephalography and Clinical Neurophysiology/Evoked Potentials Section | VOL. 129

Signal structure: from manifolds to molecules and structured sparsity

-

01 Jan 2015
01 Jan 2015

Sparse coding with fast image alignment via large displacement optical flow
Xiaoxia Sun ... Nasser M Nasrabadi
-
Xiaoxia Sun, et. al.Xiaoxia Sun ... Nasser M Nasrabadi
01 Mar 2016
01 Mar 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning Structured Sparse Representations for Voice Conversion

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM transactions on audio, speech, and language processing