Building a Speech and Text Corpus of Turkish: Large Corpus Collection with Initial Speech Recognition Results

Huseyin Polat,Saadin Oyucu

doi:10.3390/sym12020290

Abstract

To build automatic speech recognition (ASR) systems with a low word error rate (WER), a large speech and text corpus is needed. Corpus preparation is the first step required for developing an ASR system for a language with few argument speech documents available. Turkish is a language with limited resources for ASR. Therefore, development of a symmetric Turkish transcribed speech corpus according to the high resources languages corpora is crucial for improving and promoting Turkish speech recognition activities. In this study, we constructed a viable alternative to classical transcribed corpus preparation techniques for collecting Turkish speech data. In the presented approach, three different methods were used. In the first step, subtitles, which are mainly supplied for people with hearing difficulties, were used as transcriptions for the speech utterances obtained from movies. In the second step, data were collected via a mobile application. In the third step, a transfer learning approach to the Grand National Assembly of Turkey session records (videotext) was used. We also provide the initial speech recognition results of artificial neural network and Gaussian mixture-model-based acoustic models for Turkish. For training models, the newly collected corpus and other existing corpora published by the Linguistic Data Consortium were used. In light of the test results of the other existing corpora, the current study showed the relative contribution of corpus variability in a symmetric speech recognition task. The decrease in WER after including the new corpus was more evident with increased verified data size, compensating for the status of Turkish as a low resource language. For further studies, the importance of the corpus and language model in the success of the Turkish ASR system is shown.

Highlights

The primary function of an automatic speech recognition (ASR) system is to automatically convert human speech into transcribed text
word error rate (WER) is calculated as where N is the total number of symbols in the reference word, D represents the number of deleted symbols in the hypothesis with respect to the reference word, S is the number of changed symbols, and I represents the number of additional symbols
The deep neural networks (DNNs)-based system performed better than the Gaussian mixture model (GMM)-based system. This result shows that DNN is a better choice than GMM for acoustic modelling in Turkish ASR systems

Summary

Introduction

The primary function of an automatic speech recognition (ASR) system is to automatically convert human speech into transcribed text. ASR can be used to analyze social media data. The social data contains textual data, and a large number of audio, video, and image data. ASR can be used to transcribe various audio recordings and videos on social media [2]. Typical ASR works with models trained by machine learning algorithms based on statistical pattern classification [3]. Machine learning algorithms use two main approaches: supervised and unsupervised learning. Supervised learning is usually used for classification and uses labelled data as a training set. Unsupervised learning is usually used for clustering with unlabelled data.

Objectives

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Symmetry	Publication Date: Feb 17, 2020
Citations: 12	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Building a Speech and Text Corpus of Turkish: Large Corpus Collection with Initial Speech Recognition Results

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Symmetry

Lead the way for us

Similar Papers

An Investigation of Multilingual TDNN-BLSTM Acoustic Modeling for Hindi Speech Recognition
Ankit Kumar ... Rajesh Kumar Aggarwal
International Journal of Sensors, Wireless Communications and Control | VOL. 12
Ankit Kumar, et. al.Ankit Kumar ... Rajesh Kumar Aggarwal
01 Jan 2021
International Journal of Sensors, Wireless Communications and Control | VOL. 12

Theoretical Analysis of Diversity in an Ensemble of Automatic Speech Recognition Systems
Kartik Audhkhasi ... Shrikanth S Narayanan
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 22
Kartik Audhkhasi, et. al.Kartik Audhkhasi ... Shrikanth S Narayanan
01 Mar 2014
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 22

Improving Deep Learning based Automatic Speech Recognition for Gujarati
Deepang Raval ... Muktan Patel
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 21
Deepang Raval, et. al.Deepang Raval ... Muktan Patel
13 Dec 2021
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 21

Customized deep learning based Turkish automatic speech recognition system supported by language model.
Yasin Görmez
PeerJ Computer Science | VOL. 10
Yasin GörmezYasin Görmez
03 Apr 2024
PeerJ Computer Science | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Building a Speech and Text Corpus of Turkish: Large Corpus Collection with Initial Speech Recognition Results

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Symmetry