Abstract

Native language identification (NLI) is the task of identifying the first language of a user based on their speech or written text in a second language. In this paper, we propose the use of spectrogram- and cochleagram-based features extracted from very short speech utterances (0.8 s on average) to infer the native language of an Urdu speaker. The bidirectional long short-term memory (BLSTM) neural networks are adopted for the classification of utterances among the native languages. A set of experiments is carried out for the network architecture search and the system’s accuracy is evaluated on the validation data set. Overall accuracy of 74.81% and 71.61% is achieved using the Mel-frequency cepstral coefficients (MFCC) and Gammatone frequency cepstral coefficients (GFCC), respectively. Moreover, the optimized MFCC feature-based BLSTM network and GFCC feature-based BLSTM network are merged together to take advantage of both the feature sets. The experiments show that the performance of the merged network surpasses the individual BLSTM networks and accuracy of 75.69% is achieved on the evaluation data. The effect of test data duration is also analyzed (from 0.27 s to 1.5 s); in addition, it is observed that with very short duration as 0.4 s, an accuracy of over 50% can be achieved.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call