Abstract

Spoken language is the most regular method of correspondence in this day and age. Endeavours to create language recognizable proof frameworks for Indian dialects have been very restricted because of the issue of speaker accessibility and language readability. However, the necessity of SLID is expanding for common and safeguard applications day by day. Feature extraction is a basic and important procedure performed in LID. A sound example is changed over into a spectrogram visual portrayal which describes a range of frequencies in regard with time. Three such spectrogram visuals were generated namely Log Spectrogram, Gammatonegram and IIR-CQT Spectrogram for audio samples from the standardized IIIT-H Indic Speech Database. These visual representations depict language specific details and the nature of each language. These spectrograms images were then used as an input to the CNN. Classification accuracy of 98.86% was obtained using the proposed methodology.

Highlights

  • Inventive systems like Siri and Google Assistant depend on Automatic Speech Recognition (ASR)

  • The IIIT-H Indic speech database comprises of text and speech information in Bengali, Hindi, Kannada, Malayalam, Marathi, Tamil and Telugu

  • The samples in the dataset were converted into different spectrogram visuals viz. Log Spectrogram, Gammatonegram and Infinite Impulse Response Constant-Q Transform (IIR-Constant-Q transform (CQT)) Spectrogram

Read more

Summary

Introduction

Inventive systems like Siri and Google Assistant depend on Automatic Speech Recognition (ASR). In order to work appropriately the ASR frameworks expect users to manually indicate the proper input language. Traditional Language Identification (LID) systems use area explicit information for extracting hand-made features from sound samples[4]. Deep Learning and Artificial Neural Networks (ANN) are been considered the best in class for pattern recognition issues[25]. A variety of computer vision tasks like Image Classification, display better performance using Deep Neural Networks. LID can be characterized as the task of recognizing the spoken language in any given utterance

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call