Abstract

We study the adaptation of all existing language-identification system to new languages using a limited amount of training data. The platform used for this study is the system recently developed ( Yan and Barnard 1995a, b) to exploit phonotactic constraints based on language-dependent phone recognition. Using the proposed language model re-estimation technique based on probabilistic gradient descent, two new approaches and their combination are proposed and tested. These approaches all modify the phonotactic language models, so that they no longer equal the conventional maximum-likelihood estimate. The difference of these methods can be viewed as different information resampling on the same amount of data. Experiments were conducted using the standard OGI_TS database ( Muthusamy et al., 1992). For comparison, the baseline system (with traditional model estimation) was also subjected to the same set of tests. Systems trained with different amounts of training data in the new languages were evaluated. Compared with the conventional model estimation, the results demonstrate that the new methods improve adaptation to new languages. The success of the discriminative model shows that conventional model estimation is not optimal for language identification, so that improvements can be obtained by modifying the maximum-likelihood estimates of the language models.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.