Native Language Identification from Spoken Indian English

Siddika Imani ,Parismita Sarma ,K Samudravijaya

doi:10.37591/.v9i2.3253

Abstract

Automatic speech recognition (ASR) systems that facilitate voice based search and information retrieval have gained importance recently. While the performance of ASR systems for Indian languages have improved in the recent past. They have yet to gain wide acceptability as much as the ASR systems for English spoken by Indians. Almost all Indians learn English as a second or third language. So, the phoneme set and the prosody of native language of Indians influences the acoustic characteristics of spoken English. Since Indians speak a wide variety of languages, the acoustic characteristics of English spoken by Indians vary a lot. Thus, the recognition accuracy of Indian English could be improved by employing native language dependent English ASR systems. This approach requires automatic identification of the native language of the speaker. Here, we report the performance of an automatic Native Language Identification (NLI) system that recognises the native language of the speaker as Assamese or Bengali or Bodo after analysis of an English sentence spoken by the speaker. Training and performance evaluation of a NLI system needs appropriate linguistic resources. These include (a) speech data, in each of the 3 languages from several speakers, (b) corresponding word level transcriptions and (c) a pronunciation dictionary. While pronunciation dictionaries for English language are freely available, spoken English by speakers of the above-mentioned three languages and transcriptions are not publicly available. So, we created a relevant speech database. We recorded English spoken by native speakers, both male and female, of these three scheduled languages. Each speaker read 100 sentences out of a set of 700 English sentences; these were either proverbs or digit sequences. Each sentence contained 5 to 8 words. The digitised speech, recorded under ambient conditions using a laptop, had the following characteristics: 16000 Hz, 16 bit, mono. The database contains spoken English from 35 native Assamese speakers, 33 Bengali and 30 Bodo speakers. In order to carry out a threefold evaluation of the performance of the system, the speakers from each language were grouped into 3 subsets such that each subset contains nearly equal number of speakers. In each fold, one subset was designated as test data, and the remaining two subsets were used to train the system. We used Kaldi, an open source ASR toolkit, for implementation of the NLI system. As a first step in the development of NLI system, we implemented three English ASR systems, each trained using training data from one of the three languages: Assamese, Bengali and Bodo. A three-state Hidden Markov Model (HMM) represented a phone. Each state of HMM was associated with a Gaussian mixture model. We used Mel frequency cepstral coefficients and their temporal derivatives as features, and bigram as the language model. In order to identify the native language of a speaker, the test speech file was fed to each of the three ASR systems. An ASR system not only generates the decoded word sequence, but also the corresponding log likelihood. The NLI system follows the maximum likelihood criterion. The language corresponding to the ASR system that yielded the highest likelihood for the test speech was declared as the native language of the speaker. The overall accuracy of the NLI system was computed as the unweighted average recall, computed from the confusion matrix. The NLI accuracy of the system, averaged over threefold cross evaluations, was 59% for test speech of just 3 seconds. The confusion was largest among Assamese and Bengali languages as both are close members of Indo-Aryan language family. In contrast, Bodo belongs to the Sino-Tibetan language family. We discuss the performance of the NLI system using different models such as context-dependent and context independent HMMs, employing Gaussian mixture model or deep neural network to estimate the likelihood of a feature vector emitted from a state of HMM. Keywords: Automatic identification, automatic speech recognition, native language identification, voice-based search, information retrieval

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Native Language Identification from Spoken Indian English

Abstract

Talk to us

Similar Papers

More From: Trends in Electrical Engineering

Lead the way for us

Similar Papers

Performance Analysis of various Front-end and Back End Amalgamations for Noise-robust DNN-based ASR
Mohit Dua ... Vinam Agrawal
Recent Advances in Computer Science and Communications | VOL. 14
Mohit Dua, et. al.Mohit Dua ... Vinam Agrawal
01 Dec 2021
Recent Advances in Computer Science and Communications | VOL. 14

Using Auxiliary Sources of Knowledge for Automatic Speech Recognition

-

01 Jan 2004
01 Jan 2004

Acoustic and lexical resource constrained ASR using language-independent acoustic model and language-dependent probabilistic lexical model
Ramya Rasipuram ... Mathew Magimai-Doss
Speech Communication | VOL. 68
Ramya Rasipuram, et. al.Ramya Rasipuram ... Mathew Magimai-Doss
29 Dec 2015
Speech Communication | VOL. 68

Constructing accurate and robust HMM/GMM models for an Arabic speech recognition system
Mohamed O M Khelifa ... Yahya Mohamed Elhadj
International Journal of Speech Technology | VOL. 20
Mohamed O M Khelifa, et. al.Mohamed O M Khelifa ... Yahya Mohamed Elhadj
20 Sep 2017
International Journal of Speech Technology | VOL. 20

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Native Language Identification from Spoken Indian English

Abstract

Talk to us

Similar Papers

More From: Trends in Electrical Engineering