Tibetan speech synthesis based on an improved neural network

Yuntao Ding,Baojia Gong,Rangzhuoma Cai

doi:10.1051/matecconf/202133606012

Yuntao Ding, Baojia Gong + Show 1 more

Open Access

https://doi.org/10.1051/matecconf/202133606012

Copy DOI

Abstract

Nowadays, Tibetan speech synthesis based on neural network has become the mainstream synthesis method. Among them, the griffin-lim vocoder is widely used in Tibetan speech synthesis because of its relatively simple synthesis.Aiming at the problem of low fidelity of griffin-lim vocoder, this paper uses WaveNet vocoder instead of griffin-lim for Tibetan speech synthesis.This paper first uses convolution operation and attention mechanism to extract sequence features.And then uses linear projection and feature amplification module to predict mel spectrogram.Finally, use WaveNet vocoder to synthesize speech waveform. Experimental data shows that our model has a better performance in Tibetan speech synthesis.

Highlights

The speech synthesis method based on neural network greatly reduces the error rate of speech synthesis because the neural network unit has independent learning and back propagation capabilities, and the synthesized speech is closer to the human voice
Based on the literature [7], this paper proposes a Tibetan speech synthesis method based on improved neural network.By constructing an improved neural network,using WaveNet
Due to Ando Tibetan has no tonal characteristics [10], and there are similar pronunciations in the 30 consonants, such as ཅ and ཇ, ཨ and འ, etc.In order to better distinguish similar pronunciations and make the synthesized Tibetan language more natural, this paper proposes an improved neural networks for Tibetan speech synthesis

Summary

Introduction

The speech synthesis method based on neural network greatly reduces the error rate of speech synthesis because the neural network unit has independent learning and back propagation capabilities, and the synthesized speech is closer to the human voice. As an important part of Chinese information processing, Tibetan speech synthesis is the key and difficulty of Tibetan intelligent human-computer interaction. It started late, it has gradually from the wave-splicing-based Tibetan speech synthesis [5] and the statistical parameter-based Tibetan speech synthesis [6] into Tibetan speech synthesis based on neural network[7,8].In 2019, the literature[7] first proposed speech synthesis based on neural networks, which brought Tibetan speech synthesis into a new era.

Improved neural network structure

Waveform synthesis

Experiments

Objective experiment

Subjective experiment

Summary

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: MATEC Web of Conferences	Publication Date: Jan 1, 2021
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Tibetan speech synthesis based on an improved neural network

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: MATEC Web of Conferences

Lead the way for us

Similar Papers

Deep Learning for Mandarin-Tibetan Cross-Lingual Speech Synthesis
Weizhao Zhang ... Lili Wang
IEEE Access | VOL. 7
Weizhao Zhang, et. al.Weizhao Zhang ... Lili Wang
01 Jan 2019
IEEE Access | VOL. 7

Research on Tibetan Speech Synthesis Based on Fastspeech2
Ba Zu ... Zhijie Cai
-
Ba Zu, et. al.Ba Zu ... Zhijie Cai
22 Jul 2022
22 Jul 2022

Meta-Learning for Mandarin-Tibetan Cross-Lingual Speech Synthesis
Weizhao Zhang ... Hongwu Yang
Applied Sciences | VOL. 12
Weizhao Zhang, et. al.Weizhao Zhang ... Hongwu Yang
28 Nov 2022
Applied Sciences | VOL. 12

Improving Sequence-to-sequence Tibetan Speech Synthesis with Prosodic Information
Weizhao Zhang ... Hongwu Yang
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 22
Weizhao Zhang, et. al.Weizhao Zhang ... Hongwu Yang
22 Sep 2023
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Tibetan speech synthesis based on an improved neural network

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: MATEC Web of Conferences