Abstract

This era is dominated by artificial intelligence and its various applications - one of which is Spoken Language Identification (S-LID) which has always been a challenging issue and an important research area in the domain of speech signal processing. This paper deals with S-LID to be used for Human-Computer Interaction (HCI) based applications by attempting to classify various languages from three multi-lingual databases namely CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages, VoxForge and Indian Institute of Technology, Madras (IIT-Madras) speech corpus database by extracting their Mel-Spectrogram features and Relative Spectral Transform - Perceptual Linear Prediction (RASTA-PLP) features. A new hybrid Feature Selection (FS) algorithm have been developed using the versatile Harmony Search (HS) algorithm and a new nature-inspired algorithm called Naked Mole-Rat (NMR) algorithm to select the best subset of features and reduce the model complexity to help it train faster. This selected feature set is fed to five classifiers namely Support Vector Machine (SVM), k-Nearest Neighbor (k-NN), Multi-layer Perceptron (MLP), Naïve Bayes (NB) and Random Forest (RF). The evaluation measures used in this paper are precision, recall, f1-score, classification accuracy and number of selected features. An accuracy of 99.89% on CSS10, 98.22% on VoxForge and 99.75% on IIT-Madras speech corpus databases is achieved using RF. Furthermore, the proposed algorithm is found to outperform 15 standard meta-heuristic FS algorithms. The source code of this work is available at: https://github.com/CodeChef97dotcom/HS-NMR.git.

Highlights

  • Spoken Language Identification (S-LID) is a process of identifying and classifying a digitized natural spoken language by performing computational linguistic methods on the givenThe associate editor coordinating the review of this manuscript and approving it for publication was Md

  • CSS10 is a collection of single speaker speech dataset for 10 languages which consists of short audio clips from LibriVox audiobooks

  • Out of the 17 languages that are available 6 languages have been used namely, ‘‘English’’, ‘‘French’’, ‘‘German’’, ‘‘Italian’’, ‘‘Russian’’, ‘‘Spanish’’. This is because the quality of these audio files are relatively better than others and the length and format of these audio files are appropriate for this experiment

Read more

Summary

Introduction

Spoken Language Identification (S-LID) is a process of identifying and classifying a digitized natural spoken language by performing computational linguistic methods on the givenThe associate editor coordinating the review of this manuscript and approving it for publication was Md. Content or data [1]. This classification is made from a set of possible target languages [2], be it from a closed set where all possibilities are known or from an open set with unknown languages included in the test corpora. S-LID has always been a challenging problem owing to the variations in the type of speech input and understanding how human beings comprehend and interpret speech in different conditions [7]. This makes it an important research topic in the field of speech signal processing

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call