Toward new language adaptation for language identification

Etienne Barnard,Yonghong Yan

doi:10.1016/s0167-6393(97)00009-5

Abstract

We study the adaptation of all existing language-identification system to new languages using a limited amount of training data. The platform used for this study is the system recently developed ( Yan and Barnard 1995a, b) to exploit phonotactic constraints based on language-dependent phone recognition. Using the proposed language model re-estimation technique based on probabilistic gradient descent, two new approaches and their combination are proposed and tested. These approaches all modify the phonotactic language models, so that they no longer equal the conventional maximum-likelihood estimate. The difference of these methods can be viewed as different information resampling on the same amount of data. Experiments were conducted using the standard OGI_TS database ( Muthusamy et al., 1992). For comparison, the baseline system (with traditional model estimation) was also subjected to the same set of tests. Systems trained with different amounts of training data in the new languages were evaluated. Compared with the conventional model estimation, the results demonstrate that the new methods improve adaptation to new languages. The success of the discriminative model shows that conventional model estimation is not optimal for language identification, so that improvements can be obtained by modifying the maximum-likelihood estimates of the language models.

Full Text