Automatic language identification

K P Li,T J Edwards

doi:10.1121/1.2017728

Abstract

A study on automatic language identification has recently been completed. The study included 200 talkers from ten different languages. Data from five languages were comprised of reading data from all male talkers while the other five languages were comprised of two‐way conversational data from both male and female talkers. All data were band‐limited to 350‐3000 Hz and digitized at 8 kHz. Using an automatic segmentation program developed at TRW [J. Acoust. Soc. Am. Suppl. 1 64, S179(A) (1978)] which provided a set of six acoustic‐phonetic labels, several forms of Markov types of models and classification methods were developed to identify each language independently of the talker and context. Half of the talkers were used for training the models, while the other half was used for independent test. Classification methods, statistics of the data, and identification results will be presented.

Full Text