Abstract

Spoken Language Identification (LID) is the process of determining and classifying natural language from a given content and dataset. Typically, data must be processed to extract useful features to perform LID. The extracting features for LID, based on literature, is a mature process where the standard features for LID have already been developed using Mel-Frequency Cepstral Coefficients (MFCC), Shifted Delta Cepstral (SDC), the Gaussian Mixture Model (GMM) and ending with the i-vector based framework. However, the process of learning based on extract features remains to be improved (i.e. optimised) to capture all embedded knowledge on the extracted features. The Extreme Learning Machine (ELM) is an effective learning model used to perform classification and regression analysis and is extremely useful to train a single hidden layer neural network. Nevertheless, the learning process of this model is not entirely effective (i.e. optimised) due to the random selection of weights within the input hidden layer. In this study, the ELM is selected as a learning model for LID based on standard feature extraction. One of the optimisation approaches of ELM, the Self-Adjusting Extreme Learning Machine (SA-ELM) is selected as the benchmark and improved by altering the selection phase of the optimisation process. The selection process is performed incorporating both the Split-Ratio and K-Tournament methods, the improved SA-ELM is named Enhanced Self-Adjusting Extreme Learning Machine (ESA-ELM). The results are generated based on LID with the datasets created from eight different languages. The results of the study showed excellent superiority relating to the performance of the Enhanced Self-Adjusting Extreme Learning Machine LID (ESA-ELM LID) compared with the SA-ELM LID, with ESA-ELM LID achieving an accuracy of 96.25%, as compared to the accuracy of SA-ELM LID of only 95.00%.

Highlights

  • Language Identification (LID) is the process of determining and classifying a natural spoken language from given content and datasets [1, 2]

  • The findings identify that Kernel Extreme Learning Machine (ELM) (KELM) and ELM combined with DNN achieve the highest accuracy compared to the other baseline approaches

  • This study enhances the existing learning model based on the ELM named as Self-Adjusting Extreme Learning Machine (SA-ELM)

Read more

Summary

Introduction

Language Identification (LID) is the process of determining and classifying a natural spoken language from given content and datasets [1, 2] It is undertaken by performing computational linguistics approaches and applying many contexts. These contexts include; text categorisation of a written text [3] or speech recognition of a recorded utterance [4] of a spoken identified language. It is a challenging task because due to the variations in the type of speech input and understanding how humans process and interpret speech in adverse conditions [5]. A broad classification has been used to separate or split speech features into a low level and a high level

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call