Distilling Sequence-to-Sequence Voice Conversion Models for Streaming Conversion Applications

Kou Tanaka,Shogo Seki,Hirokazu Kameoka,Takuhiro Kaneko

doi:10.1109/slt54892.2023.10023432

Abstract

This paper describes a method for distilling a recurrent-based sequence-to-sequence (S2S) voice conversion (VC) model. Although the performance of recent VCs is becoming higher quality, streaming conversion is still a challenge when considering practical applications. To achieve streaming VC, the conversion model needs a streamable structure, a causal layer rather than a non-causal layer. Motivated by this constraint and recent advances in S2S learning, we apply the teacher-student framework to recurrent-based S2S- VC models. A major challenge is how to minimize degradation due to the use of causal layers which masks future input information. Experimental evaluations show that except for male-to-female speaker conversion, our approach is able to maintain the teacher model's performance in terms of subjective evaluations despite the streamable student model structure. Audio samples can be accessed on http://www.kecl.ntt.co.jp/people/tanaka.ko/projects/dists2svc.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Distilling Sequence-to-Sequence Voice Conversion Models for Streaming Conversion Applications

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

JSV-VC: Jointly Trained Speaker Verification and Voice Conversion Models
Shogo Seki ... Kou Tanaka
-
Shogo Seki, et. al.Shogo Seki ... Kou Tanaka
04 Jun 2023
04 Jun 2023

Non-Parallel Many-To-Many Voice Conversion by Knowledge Transfer from a Text-To-Speech Model
Xinyuan Yu ... Brian Mak
-
Xinyuan Yu, et. al.Xinyuan Yu ... Brian Mak
06 Jun 2021
06 Jun 2021

Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech Pretraining
Wen-Chin Huang ... Yi-Chiao Wu
-
Wen-Chin Huang, et. al.Wen-Chin Huang ... Yi-Chiao Wu
25 Oct 2020
25 Oct 2020

Pretraining Techniques for Sequence-to-Sequence Voice Conversion
Wen-Chin Huang ... Hirokazu Kameoka
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 29
Wen-Chin Huang, et. al.Wen-Chin Huang ... Hirokazu Kameoka
01 Jan 2020
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 29

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Distilling Sequence-to-Sequence Voice Conversion Models for Streaming Conversion Applications

Abstract

Talk to us

Similar Papers