A New Word Clustering Method for Building N-Gram Language Models in Continuous Speech Recognition Systems

Mohammad Bahrani,Saeedeh Momtazi,Hossein Sameti,Nazila Hafezi

doi:10.1007/978-3-540-69052-8_30

Abstract

In this paper a new method for automatic word clustering is presented. We used this method for building n-gram language models for continuous speech recognition (CSR) systems. In this method, each word is specified by a feature vector that represents the statistics of parts of speech (POS) of that word. The feature vectors are clustered by k-means algorithm. Using this method causes a reduction in time complexity which is a defect in other automatic clustering methods. Also, the problem of high perplexity in manual clustering methods is abated. The experimental results are based on Persian Text Corpus which contains about 9 million words. The extracted language models are evaluated by the perplexity criterion and the results show that a considerable reduction in perplexity has been achieved. Also reduction in word error rate of CSR system is about 16% compared with a manual clustering method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A New Word Clustering Method for Building N-Gram Language Models in Continuous Speech Recognition Systems

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Integrating different acoustic and syntactic language models in a continuous speech recognition system
Amparo Varona ... In Torres
-
Amparo Varona, et. al.Amparo Varona ... In Torres
16 Oct 2000
16 Oct 2000

A Possibilistic Approach for Building Statistical Language Models
Saeedeh Momtazi ... Hossein Sameti
-
Saeedeh Momtazi, et. al.Saeedeh Momtazi ... Hossein Sameti
01 Jan 2009
01 Jan 2009

Keyword-spotting using SRI's DECIPHER large-vocabulary speech-recognition system
M Weintraub
-
M WeintraubM Weintraub
01 Jan 1992
01 Jan 1992

Improved acoustic modeling for continuous speech recognition
C.-H Lee ... R Pieraccini
-
C.-H Lee, et. al.C.-H Lee ... R Pieraccini
01 Jan 1990
01 Jan 1990

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A New Word Clustering Method for Building N-Gram Language Models in Continuous Speech Recognition Systems

Abstract

Talk to us

Similar Papers