Abstract

Research in speech recognition is progressing with numerous state-of-the-art results in recent times. However, relatively fewer research is being carried out in Automatic Speech Recognition (ASR) for languages with low resources. We present a method to develop speech recognition model with minimal resources using Mozilla DeepSpeech architecture. We have utilized freely available online computational resources for training, enabling similar approaches to be carried out for research in a low-resourced languages in a financially constrained environments. We also present novel ways to build an efficient language model from publicly available web resources to improve accuracy in ASR. The proposed ASR model gives the best result of 24.7% Word Error Rate (WER), compared to 55% WER by Google speech-to-text. We have also demonstrated a semi-supervised development of speech corpus using our trained ASR model, indicating a cost effective approach of building large vocabulary corpus for low resource language. The trained Tamil ASR model and the training sets are released in public domain and are available on GitHub.

Highlights

  • The recent advancement in Automatic Speech Recognition (ASR) in the past couple of years is commendable, surpassing even human perception

  • DeepSpeech architecture needs Graphics Processing Unit (GPU) resources to run the training in minimal time, GPU is selected as the hardware accelerator in Google Colaboratory (GC)

  • Even though GC has usage time limits while using GPU, checkpoints are saved at regular intervals, which are continued after the time limit is revoked

Read more

Summary

Introduction

The recent advancement in Automatic Speech Recognition (ASR) in the past couple of years is commendable, surpassing even human perception. We investigate the use of open-source speech recognition toolkits to build a speech recognition model for the Tamil language This developed pre-trained model will provide an out-of-the-box support for transfer learning for keyword spotting, isolated word recognition, etc. We present a novel approach to build a pre-trained model using low resources and substantially assist in developing a massive speech corpus using semi-supervised learning. To our knowledge, this is the first attempt to use the Common Voice dataset and release a pre-trained ASR model for Tamil language.

Related Works
Is Tamil a Low Resource Language?
Structure of Tamil Language and Its Challenges
Tamil ASR System Architecture
Speech Corpus
ASR Architecture
Language Model
Model Training and Results
Training Setup
Transfer Learning for Isolated Tamil Digit Recognition
Semi-Supervised Development of Speech Corpus
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call