Sample selection for automatic language identification

David Farris,Chris White,Sanjeev Khudanpur

doi:10.1109/icassp.2008.4518587

Abstract

Current approaches to automatic spoken language identification (LID) assume the availability of a large corpus of manually language-labeled speech samples for training statistical classifiers. We investigate two methods of active learning to significantly reduce the amount of labeled speech needed for training LID systems. Starting with a small training set, an automated method is used to select samples from a corpus of unlabeled speech, which are then labeled and added to the training pool - one selection method is based on a previously known entropy criterion, and another on a novel likelihood-ratio criterion. We demonstrate LID performance comparable to a large training corpus using only a tenth of the training data. A further 40% improvement in LID performance is obtained using a third of the training data. Finally, we show that our novel selection method is more robust to variance in the unlabeled pool than the entropy based method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Sample selection for automatic language identification

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Unsupervised Deep Language and Dialect Identification for Short Texts
Koustava Goswami ... Theodorus Fransen
-
Koustava Goswami, et. al.Koustava Goswami ... Theodorus Fransen
01 Jan 2020
01 Jan 2020

A hierarchical language identification system for Indian languages
S. Jothilakshmi ... V. Ramalingam
Digital Signal Processing | VOL. 22
S. Jothilakshmi, et. al.S. Jothilakshmi ... V. Ramalingam
27 Jan 2012
Digital Signal Processing | VOL. 22

Language identification using acoustic log-likelihoods of syllable-like units
T Nagarajan ... H.A Murthy
Speech Communication | VOL. 48
T Nagarajan, et. al.T Nagarajan ... H.A Murthy
19 Jan 2006
Speech Communication | VOL. 48

Performance Analysis of Windowing Techniques in Automatic Speech Signal Segmentation
P L Chithra ... R Aparna
Indian Journal of Science and Technology | VOL. 8
P L Chithra, et. al.P L Chithra ... R Aparna
16 Nov 2015
Indian Journal of Science and Technology | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sample selection for automatic language identification

Abstract

Talk to us

Similar Papers