Phoneme Hallucinator: One-Shot Voice Conversion via Set Expansion

Siyuan Shan,Junier B Oliva,Amartya Banerjee,Yang Li

doi:10.1609/aaai.v38i13.29411

Abstract

Voice conversion (VC) aims at altering a person's voice to make it sound similar to the voice of another person while preserving linguistic content. Existing methods suffer from a dilemma between content intelligibility and speaker similarity; i.e., methods with higher intelligibility usually have a lower speaker similarity, while methods with higher speaker similarity usually require plenty of target speaker voice data to achieve high intelligibility. In this work, we propose a novel method Phoneme Hallucinator that achieves the best of both worlds. Phoneme Hallucinator is a one-shot VC model; it adopts a novel model to hallucinate diversified and high-fidelity target speaker phonemes based just on a short target speaker voice (e.g. 3 seconds). The hallucinated phonemes are then exploited to perform neighbor-based voice conversion. Our model is a text-free, any-to-any VC model that requires no text annotations and supports conversion to any unseen speaker. Quantitative and qualitative evaluations show that Phoneme Hallucinator outperforms existing VC methods for both intelligibility and speaker similarity.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Phoneme Hallucinator: One-Shot Voice Conversion via Set Expansion

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Mar 24, 2024
Citations: 1

Similar Papers

JSV-VC: Jointly Trained Speaker Verification and Voice Conversion Models
Shogo Seki ... Kou Tanaka
-
Shogo Seki, et. al.Shogo Seki ... Kou Tanaka
04 Jun 2023
04 Jun 2023

Non-Parallel Many-To-Many Voice Conversion by Knowledge Transfer from a Text-To-Speech Model
Xinyuan Yu ... Brian Mak
-
Xinyuan Yu, et. al.Xinyuan Yu ... Brian Mak
06 Jun 2021
06 Jun 2021

DVQVC: An Unsupervised Zero-Shot Voice Conversion Framework
Dayong Li ... Xiaofei Li
-
Dayong Li, et. al.Dayong Li ... Xiaofei Li
04 Jun 2023
04 Jun 2023

Non-parallel Voice Conversion with Controllable Speaker Individuality using Variational Autoencoder
Tuan Vu Ho ... Masato Akagi
-
Tuan Vu Ho, et. al.Tuan Vu Ho ... Masato Akagi
01 Nov 2019
01 Nov 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Phoneme Hallucinator: One-Shot Voice Conversion via Set Expansion

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence