Construction and Analysis of Tibetan AMDO Dialect Speech Dataset for Speech Synthesis

Xinyi Zhang,Jianguo Wei,Xinyue Zhao,Wenhuan Lu,Yi Zhu

doi:10.1109/o-cocosda202152914.2021.9660562

Abstract

In recent years, with the development of deep learning, speech synthesis technology has developed rapidly. Limited to the small size of corpus data, the field of Tibetan speech synthesis has several problems, such as slow development and insufficient in-depth research. Therefore, the construction of Tibetan speech synthetic dataset is of great significance. In this paper, to meet the needs of speech synthesis system based on neural network, according to the methods of phoneme balance, Tibetan character frequency analysis, Tibetan character word-formation analysis, we design a speech corpus which abides by the characteristics of Tibetan Amdo dialect. Using professional instruments and equipment, we record two professional hosts Tibetan Amdo audio, and make a 60 hours of Tibetan Amdo dialect speech dataset. In the mainstream end-to-end speech synthesis framework, it can prove that the synthetic speech is natural and understandable, which shows the effectiveness and usefulness of our dataset.

Full Text