Abstract

With the recent advancements in the fields of machine learning and artificial intelligence, spoken language identification-based applications have been increasing in terms of the impact they have on the day-to-day lives of common people. Western countries have been enjoying the privilege of spoken language recognition-based applications for a while now, however, they have not gained much popularity in multi-lingual countries like India owing to various complexities. In this paper, we have addressed this issue by attempting to identify different Indian languages based on various well-known features like Mel-Frequency Cepstral Coefficient (MFCC), Linear Prediction Coefficient (LPC), Discrete Wavelet Transform (DWT), Gammatone Frequency Cepstral Coefficient (GFCC) as well as a few deep learning architecture based features like i-vector and x-vector extracted from the audio signals. After comparing the initial results, it is observed that the combination of MFCC and LPC produces the best results. Then we have developed a new nature-inspired feature selection (FS) algorithm by hybridizing Binary Bat Algorithm (BBA) with Late Acceptance Hill-Climbing (LAHC) to select the optimal subset from the said feature vectors in order to reduce the model complexity and help it train faster. Using Random Forest (RF) classifier, we have achieved an accuracy of 92.35% on Indic TTS database developed by IIT-Madras, and an accuracy of 100% on the Indic Speech database developed by the Speech and Vision Laboratory (SVL) IIIT-Hyderabad. The proposed algorithm is also found to outperform many standard meta-heuristic FS algorithms. The source code of this work is available at: https://github.com/CodeChef97dotcom/Feature-Selection

Highlights

  • Speech is one of the most innate human capabilities

  • We explore a new approach to develop a feature selection (FS) algorithm using a hybrid of Binary Bat Algorithm (BBA) and Late Acceptance Hill-Climbing (LAHC) algorithm for classifying Indian languages based on their Mel-frequency Cepstral Coefficient (MFCC) and Linear Prediction Coefficient (LPC) features

  • We have performed experiment on the database of 7 Indic languages [51], developed by Speech and Vision Laboratory (SVL) at IIIT-Hyderabad. This database consists of 1000 utterances for each of the 7 languages and each sentence is available as a separate audio clip in the database

Read more

Summary

Introduction

Speech is one of the most innate human capabilities. When we speak with one another, we use not just words and associated emotions and sentiments to convey meaning and get our opinions across. There are many features associated with spoken language that allow us to deliver information that. The associate editor coordinating the review of this manuscript and approving it for publication was K. Spoken language involves the actual use of speech or related utterances that convey meaning to share the thoughts or other information. Processing of spoken languages involves human-computer interaction (HCI) which has significantly improved over the last decade. Automatic language identification plays a vital role in a wide range of services. Almost everyone is equipped with smartphones which makes life much easier. People can control their daily activities like calling someone, turning on

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call