Abstract

This paper investigates various approaches to data sampling and dimensionality reduction for discriminative language models (DLM). Being a feature based language modeling approach, the aim of DLM is to rerank the ASR output with discriminatively trained feature parameters. Using a Turkish morphology based feature set, we examine the use of online Principal Component Analysis (PCA) as a dimensionality reduction method. We exploit ranking perceptron and ranking SVM as two alternative discriminative modeling techniques, and apply data sampling to improve their efficiency. We obtain a reduction in word error rate (WER) of 0.4%, significant at p < 0.001 over the baseline perceptron result.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call