Abstract

This paper proposes Universal Background Model (UBM) fusion in the framework of total variability or i-vector modeling with the application to language identification (LID). The total variability subspace which is typically exploited to discriminate between the language classes of different speech recordings, is trained by combining the normalized Baum-Welch statistics of multiple UBMs. When the UBMs model a diverse set of feature representations, the method yields an i-vector representation which is more discriminant between the classes of interest. This approach is particularly useful when applied to shortduration utterances, and is a computationally less complex alternative to performance boosting as compared to system level fusion. We assess the performance of UBM fused total variability modeling on the task of robust language identification on short-duration utterances, as part of Phase-III of the DARPA RATS (Robust Automatic Transcription of Speech) program. Index Terms: language identification, i-vector representation, short-duration, noise robustness, RATS

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call