Analysis and solution to aliasing artifacts in neural waveform generation models

Zengqiang Shang,Haozhe Zhang,Pengyuan Zhang,Li Wang,Ta Li

doi:10.1016/j.apacoust.2022.109183

Abstract

In recent years, with the application of deep learning in speech synthesis, waveform generation models based on generative adversarial networks have achieved high quality comparable to natural speech. In most waveform generators, a neural upsampling unit plays an essential role as it is employed to upsample acoustic features to the sample point level. However, aliasing artifacts are observed in the generated speech regardless of whether transposed convolution, subpixel convolution, or nearest neighbor interpolation are used as temporary upsampling layers. Non-ideal upsampling filters produce aliasing, according to the Shannon-Nyquist sampling theorem. This paper aims to systematically analyze how aliasing artifacts are produced in non-ideal upsampling-based waveform generators. We investigate the HiFi-GAN and VITS generation processes and discover that high-frequency spectral details are generated based on low-frequency structures using the nonlinear transformation. However, the nonlinear transformation was unable to completely remove the low-frequency spectral imprint, which eventually manifested as spectral artifacts in generated waveforms. To suppress aliasing artifacts, a low-pass filter is applied after the upsampling layer, but this results in significant performance drops. The experimental results also show that aliasing speeds up the training process by filling high-frequency vacancies. In this regard, we propose to mix high-frequency components into low-pass filtered features, allowing models to converge faster while naturally avoiding artifacts. In addition, to assess the efficacy of our method, we created an artifact-detection algorithm based on structural similarity.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Analysis and solution to aliasing artifacts in neural waveform generation models

Abstract

Talk to us

Similar Papers

More From: Applied Acoustics

Lead the way for us

Journal: Applied Acoustics	Publication Date: Jan 12, 2023
Citations: 1

Similar Papers

Ultrasound image denoising using generative adversarial networks with residual dense connectivity and weighted joint loss.
Lun Zhang ... Junhua Zhang
PeerJ. Computer science | VOL. 8
Lun Zhang, et. al.Lun Zhang ... Junhua Zhang
16 Feb 2022
PeerJ. Computer science | VOL. 8

Using GAN Neural Networks for Super-Resolution Reconstruction of Temperature Fields
Tao Li ... Jinyue Xia
Intelligent Automation & Soft Computing | VOL. 35
Tao Li, et. al.Tao Li ... Jinyue Xia
01 Jan 2023
Intelligent Automation & Soft Computing | VOL. 35

Deep learning for smart agriculture: Concepts, tools, applications, and opportunities
...
International Journal of Agricultural and Biological Engineering | VOL. 11
, et. al. ...
08 Aug 2018
International Journal of Agricultural and Biological Engineering | VOL. 11

Radiology Alchemy: GAN We Do It?
Paul H Yi ... Jan Fritz
Radiology. Artificial intelligence | VOL. 3
Paul H Yi, et. al.Paul H Yi ... Jan Fritz
01 Sep 2021
Radiology. Artificial intelligence | VOL. 3

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Analysis and solution to aliasing artifacts in neural waveform generation models

Abstract

Talk to us

Similar Papers

More From: Applied Acoustics