Abstract

Today, speech synthesis is a part of our daily lives in computers all around the world. Central Kurdish Speech Corpus Construction is a speech corpus that is a primary data source for developing a speech system. There are still two main issues that prevent them from achieving the best possible performance, the lack of efficiency in training and analysis, and the difficulty in modelling. The biggest obstacle against text-to-speech in the Kurdish language is that there is a lack of text and speech recognition tools compounded by the fact that around 30 million people speak the Kurdish language in different countries. To address this issue, this corpus introduced a large vocabulary of Kurdish Text-to-Speech Dataset (KTTS, Gigant), including a pronunciation lexicon and speech corpus for the Central Kurdish dialect. A variety of subjects is comprised to record these sentences. The sentences are recorded in a voice recording studio by a Kurdish man who is a dubber. The goal of the speech corpus is to create a collection of sentences that accurately reflect the real data about the Central Kurdish dialect. A combination of audio and visual sources is used to record the 6,078 sentences of 12 document topics. They were recorded in a controlled environment using microphones that were not noisy. The total record duration is 13.63 h. The recorded sentences are in the “.wav” format.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.