Avocodo: Generative Adversarial Network for Artifact-Free Vocoder

Taejun Bak,Young-Sun Joo,Jinhyeok Yang,Jae-Sung Bae,Junmo Lee,Hanbin Bae

doi:10.1609/aaai.v37i11.26479

Abstract

Neural vocoders based on the generative adversarial neural network (GAN) have been widely used due to their fast inference speed and lightweight networks while generating high-quality speech waveforms. Since the perceptually important speech components are primarily concentrated in the low-frequency bands, most GAN-based vocoders perform multi-scale analysis that evaluates downsampled speech waveforms. This multi-scale analysis helps the generator improve speech intelligibility. However, in preliminary experiments, we discovered that the multi-scale analysis which focuses on the low-frequency bands causes unintended artifacts, e.g., aliasing and imaging artifacts, which degrade the synthesized speech waveform quality. Therefore, in this paper, we investigate the relationship between these artifacts and GAN-based vocoders and propose a GAN-based vocoder, called Avocodo, that allows the synthesis of high-fidelity speech with reduced artifacts. We introduce two kinds of discriminators to evaluate speech waveforms in various perspectives: a collaborative multi-band discriminator and a sub-band discriminator. We also utilize a pseudo quadrature mirror filter bank to obtain downsampled multi-band speech waveforms while avoiding aliasing. According to experimental results, Avocodo outperforms baseline GAN-based vocoders, both objectively and subjectively, while reproducing speech with fewer artifacts.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Avocodo: Generative Adversarial Network for Artifact-Free Vocoder

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Jun 26, 2023
Citations: 6

Similar Papers

DSPGAN: A Gan-Based Universal Vocoder for High-Fidelity TTS by Time-Frequency Domain Supervision from DSP
Kun Song ... Lei Xie
-
Kun Song, et. al.Kun Song ... Lei Xie
04 Jun 2023
04 Jun 2023

Construction of Sports Training Performance Prediction Model Based on a Generative Adversarial Deep Neural Network Algorithm.
Gang Li
Computational intelligence and neuroscience | VOL. 2022
Gang LiGang Li
21 May 2022
Computational intelligence and neuroscience | VOL. 2022

Method and apparatus for generating modified speech from pitch-synchronous segmented speech waveforms
George S Kang ... Lawrence J Fransen
The Journal of the Acoustical Society of America | VOL. 107
George S Kang, et. al.George S Kang ... Lawrence J Fransen
01 Jan 1999
The Journal of the Acoustical Society of America | VOL. 107

Speech Enhancement Using Generative Adversarial Network (GAN)
Mahmudul Huq ... Rytis Maskeliunas
-
Mahmudul Huq, et. al.Mahmudul Huq ... Rytis Maskeliunas
01 Jan 2021
01 Jan 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Avocodo: Generative Adversarial Network for Artifact-Free Vocoder

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence