Abstract

Automatic Speech Recognition (ASR) as an interesting field of speech processing, is nowadays utilized in real applications which are implemented using various techniques. Amongst them, the artificial neural network is the most popular one. Increasing the performance and making these systems robust to noise are among the current challenges. This paper addresses the development of an ASR system for the Central Kurdish language (CKB) using a transfer learning of Deep Neural Networks (DNN). The combination of Mel-Frequency Cepstral Coefficients (MFCCs) for extracting features of speech signals, Long Short-Term Memory (LSTM) with Connectionist Temporal Classification (CTC) output layer is used to create an Acoustic Model (AM) on the AsoSoft CKB speech dataset. Also, we have used the N-gram language model on the collected large text dataset which includes about 300 million tokens. The text corpus is also used to extract a dynamic lexicon model that contains over 2.5 million CKB words. The obtained results show that the use of a DNN improves the results compared to classical statistics modules. The proposed method achieves a 0.22%-word error rate by combining transfer learning and language model adaptation. This result is superior to the best-reported result for the CKB.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call