Language Identification Accuracy Research Articles

Research and development of speech technology applications in low-resource languages (LRL) are challenging due to the non-availability of proper speech corpus. Especially, for most of the Indian languages, the amount and type of data found in different digital sources are sparse and prior works are too few to serve the purpose of large-scale development needs. This paper illustrates the creation process of such an LRL corpus comprising of sixteen rarely studied Eastern and Northeastern (E&NE) Indian languages and presents the data variability with different statistics. Furthermore, several experiments are carried out using the collected LRL corpus to build baseline speaker identification (SID) and language identification (LID) system for acceptance evaluation. For investigating the presence of speaker and language-specific information, spectral features like Mel frequency cepstral coefficients (MFCCs), shifted delta cepstral (SDC), and relative spectral transform-perceptual linear prediction (RASTA-PLP) features are used here. Vector quantization (VQ), Gaussian mixture models (GMMs), support vector machine (SVM), and multilayer perceptron (MLP)-based models are developed to represent the speaker and language-specific information captured through the spectral features. Apart from this, i-vectors, time delay neural networks (TDNN), and recurrent neural network with long short-term memory (LSTM-RNN) method-based SID and LID models are being experimented with to comply with the recent approaches. Performances of the developed systems are analyzed with LRL corpus in terms of SID and LID accuracy. The best SID and LID performances are observed to be 94.49% and 95.69%, respectively, for the baseline systems using LSTM-RNN with MFCC + SDC feature.

Read full abstract

In this work, the linear prediction (LP) residual signal has been parameterized to capture the excitation source information for language identification (LID) study. LP residual signal has been processed at three different levels: sub-segmental, segmental and supra-segmental levels to demonstrate different aspects of language-specific excitation source information. Proposed excitation source features have been evaluated on 27 Indian languages from Indian Institute of Technology Kharagpur-Multi Lingual Indian Language Speech Corpus (IITKGP-MLILSC), Oregon Graduate Institute Multi-Language Telephone-based Speech (OGI-MLTS) and National Institute of Standards and Technology Language Recognition Evaluation (NIST LRE) 2011 corpora. LID systems were developed using Gaussian mixture model (GMM) and i-vector based approaches. Experimental results have shown that segmental level parametric features provide better identification accuracy (62%), compared to sub-segmental (40%) and supra-segmental level (34%) features. Excitation source features obtained from three levels show distinct language-specific evidence. Therefore, the scores from all three levels are combined to obtain the complete excitation source information for the LID task. LID performances achieved from both the excitation source and vocal tract system are compared. Finally, the scores obtained by processing the vocal tract and excitation source features are combined to achieve better improvement in LID accuracy. The best recognition accuracies obtained from stage-IV integrated LID systems I, II and III are 69%, 70% and 72% respectively.

Read full abstract

Language Identification Accuracy Research Articles

Related Topics

Articles published on Language Identification Accuracy

Spoken Language Identification using CNN with Log Mel Spectrogram Features in Indian Context

Integration of Phonotactic Features for Language Identification on Code-Switched Speech

Multilingual Speech Corpus in Low-Resource Eastern and Northeastern Indian Languages for Speaker and Language Identification

Language Recognition using Neural Phone Embeddings and RNNLMs

On Hierarchical Text Language-Identification Algorithms

Language identification using phase information

Parametric representation of excitation source information for language identification

Spoken language recognition-a step toward multilinguality in speech processing

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Language Identification Accuracy Research Articles

Related Topics

Articles published on Language Identification Accuracy

Spoken Language Identification using CNN with Log Mel Spectrogram Features in Indian Context

Integration of Phonotactic Features for Language Identification on Code-Switched Speech

Multilingual Speech Corpus in Low-Resource Eastern and Northeastern Indian Languages for Speaker and Language Identification

Language Recognition using Neural Phone Embeddings and RNNLMs

On Hierarchical Text Language-Identification Algorithms

Language identification using phase information

Parametric representation of excitation source information for language identification

Spoken language recognition-a step toward multilinguality in speech processing