PeriodNet: A Non-Autoregressive Raw Waveform Generative Model With a Structure Separating Periodic and Aperiodic Components

Yukiya Hono,Yoshihiko Nankaku,Kei Hashimoto,Shinji Takaki,Keiichiro Oura,Keiichi Tokuda

doi:10.1109/access.2021.3118033

Yukiya Hono, Yoshihiko Nankaku + Show 4 more

Open Access

https://doi.org/10.1109/access.2021.3118033

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 4	License type: CC BY 4.0

Affiliation: Nagoya Institute of Technology

Abstract

This paper presents PeriodNet, a non-autoregressive (non-AR) waveform generative model with a new model structure for modeling periodic and aperiodic components in speech waveforms. Non-AR raw waveform generative models have enabled the fast generation of high-quality waveforms. However, the variations of waveforms that these models can reconstruct are limited by training data. In addition, typical non-AR models reconstruct a speech waveform from a single Gaussian input despite the mixture of periodic and aperiodic signals in speech. These may significantly affect the waveform generation process in some applications such as singing voice synthesis systems, which require reproducing accurate pitch and natural sounds with less periodicity, including husky and breath sounds. PeriodNet uses a parallel or series model structure to model a speech waveform to tackle these problems. Two sub-generators connected in parallel or in series take an explicit periodic and aperiodic signal (sine wave and Gaussian noise) as an input. Since PeriodNet models periodic and aperiodic components by focusing on whether these input signals are autocorrelated or not, it does not require external periodic/aperiodic decomposition during training. Experimental results show that our proposed structure improves the naturalness of generated waveforms. We also show that speech waveforms with a pitch outside of the training data range can be generated with more naturalness.

Highlights

S PEECH synthesis technology has been rapidly advancing with the introduction of neural networks (NNs)
We show that PeriodNet can model a speech waveform while appropriately separating periodic and aperiodic components during the training process by comparing it with systems that use periodic and aperiodic waveforms pre-decomposed by using explicit decomposition techniques (Section V)
(b) M02 (+1600 cents) components in speech waveforms called “PeriodNet.” PeriodNet consists of two sub-generators connected in parallel or in series that take a sine-based input signal and a Gaussian noise signal, respectively, and it represents a speech waveform as the sum of the outputs of both generators

Summary

INTRODUCTION

S PEECH synthesis technology has been rapidly advancing with the introduction of neural networks (NNs). Hono et al.: PeriodNet: A non-AR raw waveform generative model with a structure separating periodic and aperiodic components forms by conditioning acoustic features [12], they have succeeded in replacing the conventional vocoders by giving speech applications the benefit of generating high-quality speech waveforms [13]–[15]. They have a huge network architecture with AR mechanisms, which suffer from a slow inference speed.

NEURAL WAVEFORM GENERATIVE MODELS

DETAILS OF TRAINING FRAMEWORK

EXPERIMENTAL CONDITIONS

BM3 PM1 PM2

16 Periodic waveform

BM1 BM3 PM1 PM2 SM NAT

Findings

CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

PeriodNet: A Non-Autoregressive Raw Waveform Generative Model With a Structure Separating Periodic and Aperiodic Components

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Periodnet: A Non-Autoregressive Waveform Generation Model with a Structure Separating Periodic and Aperiodic Components
Yukiya Hono ... Keiichiro Oura
-
Yukiya Hono, et. al.Yukiya Hono ... Keiichiro Oura
06 Jun 2021
06 Jun 2021

An iterative algorithm for decomposition of speech signals into periodic and aperiodic components
B Yegnanarayana ... V Darsinos
IEEE Transactions on Speech and Audio Processing | VOL. 6
B Yegnanarayana, et. al.B Yegnanarayana ... V Darsinos
01 Jan 1998
IEEE Transactions on Speech and Audio Processing | VOL. 6

Speech Synthesis Using WaveNet Vocoder Based on Periodic/Aperiodic Decomposition
Takato Fujimoto ... Takenori Yoshimura
-
Takato Fujimoto, et. al.Takato Fujimoto ... Takenori Yoshimura
01 Nov 2018
01 Nov 2018

A Unified Framework for Investigating Aperiodic and Periodic Components in the Hearbeat Dynamics Spectrum: a Feasibility Study.
Vincenzo Catrambone ... Gaetano Valenza
Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference | VOL. 2023
Vincenzo Catrambone, et. al.Vincenzo Catrambone ... Gaetano Valenza
24 Jul 2023
24 Jul 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

PeriodNet: A Non-Autoregressive Raw Waveform Generative Model With a Structure Separating Periodic and Aperiodic Components

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access