A Possibilistic Approach for Building Statistical Language Models

Saeedeh Momtazi,Hossein Sameti

doi:10.1109/isda.2009.197

Abstract

Class-based n-gram language models are those most frequently-used in continuous speech recognition systems, especially for languages for which no richly annotated corpora are available. Various word clustering algorithms have been proposed to build such class-based models. In this work, we discuss the superiority of soft approaches to class construction, whereby each word can be assigned to more than one class. We also propose a new method for possibilistic word clustering. The possibilistic C-mean algorithm is used as our clustering method. Various parameters of this algorithm are investigated; e.g., centroid initialization, distance measure, and words’ feature vector. In the experiments reported here, this algorithm is applied to the 20,000 most frequent Persian words, and the language model built with the clusters created in this fashion is evaluated based on its perplexity and the accuracy of a continuous speech recognition system. Our results indicate a 10% reduction in perplexity and a 4% reduction in word error rate.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Possibilistic Approach for Building Statistical Language Models

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A New Word Clustering Method for Building N-Gram Language Models in Continuous Speech Recognition Systems
Mohammad Bahrani ... Nazila Hafezi
-
Mohammad Bahrani, et. al.Mohammad Bahrani ... Nazila Hafezi
18 Jun 2008
18 Jun 2008

Random Clusterings for Language Modeling
A Emami ... F Jelinek
-
A Emami, et. al.A Emami ... F Jelinek
18 Mar 2005
18 Mar 2005

Neural network based language models for highly inflective languages
Tomas Mikolov ... Jan Cernocky
-
Tomas Mikolov, et. al.Tomas Mikolov ... Jan Cernocky
01 Jan 2009
01 Jan 2009

Adapted language modeling for recognition of retelling story in language learning
Meng Chen ... Yang Song
-
Meng Chen, et. al.Meng Chen ... Yang Song
01 Jul 2012
01 Jul 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Possibilistic Approach for Building Statistical Language Models

Abstract

Talk to us

Similar Papers