Abstract

The ongoing development of audio datasets for numerous languages has spurred research activities towards designing smart speech recognition systems. A typical speech recognition system can be applied in many emerging applications, such as smartphone dialing, airline reservations, and automatic wheelchairs, among others. Urdu is a national language of Pakistan and is also widely spoken in many other South Asian countries (e.g., India, Afghanistan). Therefore, we present a comprehensive dataset of spoken Urdu digits ranging from 0 to 9. Our dataset has 25,518 sound samples that are collected from 740 participants. To test the proposed dataset, we apply different existing classification algorithms on the datasets including Support Vector Machine (SVM), Multilayer Perceptron (MLP), and flavors of the EfficientNet. These algorithms serve as a baseline. Furthermore, we propose a convolutional neural network (CNN) for audio digit classification. We conduct the experiment using these networks, and the results show that the proposed CNN is efficient and outperforms the baseline algorithms in terms of classification accuracy.

Highlights

  • Deep Learning has been successful in multiple domains including image classification [1,2,3], text classification [4,5], speech recognition [6,7,8], and many more [9]

  • We propose a comprehensive Audio Urdu Digits

  • We propose a convolutional neural network (CNN) for classification that shows impressive performance being a simple CNN compared to complex flavors of efficientNet

Read more

Summary

Introduction

Deep Learning has been successful in multiple domains including image classification [1,2,3], text classification [4,5], speech recognition [6,7,8], and many more [9]. In all languages, data have been a key challenge for any deep learning task, as data are difficult to find, and data collection is a time-consuming and tedious task to do. Research and development for any language is not possible. To continue the robust and reliable development of speech recognition, it is necessary to have publicly available data so researchers can continue to develop speech-related systems. For Urdu audio digit classification, there is no dataset available publicly. Inspired by the need for audio digit recognition, we release the

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call