Abstract

Articulatory feature-based conditional pronunciation modeling (AFCPM) aims to capture the pronunciation characteristics of speakers by modeling the linkage between the states of articulation during speech production and the actual phones produced by a speaker. Previous AFCPM systems use one discrete density function for each phoneme to model the pronunciation characteristics of speakers. This paper proposes using a mixture of discrete density functions for AFCPM. In particular, the pronunciation characteristics of each phoneme is modeled by two density functions: one responsible for describing the articulatory features that are more relevant to vowels and the other for consonants. Verification scores are the weighted sum of the outputs of the two models. To enhance the resolution of the pronunciation models, four articulatory properties (front-back, liprounding, place of articulation, and manner of articulation) are used for pronunciation modeling. The proposed AFCPM is applied to a speaker verification task. Results show that using four articulatory features achieves a lower error rate as compared to using two features (manner and place of articulation) only. It was also found that dividing the articulatory properties into two groups is an effective means of solving the data-sparseness problem encountered in the training phase of AFCPM systems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call