Abstract

Two important challenges for speaker recognition applications are noise robustness and portability to new languages. We present an approach that integrates multiple components and models for improved speaker identification in spontaneous Arabic speech in adverse acoustic conditions. We used two different acoustic speaker models: cepstral Gaussian mixture models (GMM) and maximum likelihood linear regression support vector machine (MLLR-SVM) models and a neural network combiner. The noise-robust components are Wiener filtering, speech-nonspeech segmentation, and frame selection. We present baselines and results on the Arabic portion of the NIST mixer data, in clean conditions and with added noise at different signal-to-noise ratios. We used two realistic noises: babble and city traffic. In both noisy scenarios, we found significant equal error rate (EER) reductions over the no-compensation condition. The various noise robustness methods gave complementary gains for both acoustic models. Finally, the combiner provides a reduction in EER over the individual systems in noisy conditions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call