Joint-character-POC N-gram language modeling for Chinese speech recognition

Bin Wang,Akinori Kawamura,Jian Li,Zhijian Ou

doi:10.1109/iscslp.2014.6936588

Abstract

The state-of-the-art language models (LMs) for Chinese speech recognition are word n-gram models. However, in Chinese, characters are morphological in meaning and words are not consistently defined. There are recent interests in building the character n-gram LM and its combination with the word n-gram LM. In this paper, in order to exploit both character-level and word-level constraints, we propose the joint n-gram LM, which is an n-gram model based on joint-state that is a pair of character and its position-of-character (POC) tag. We point out the pitfall in naive solving of the smoothing and scoring problems for joint n-gram models, and provide corrected solutions. For experimental comparison, different LMs (including word 4-grams, character 6-grams and joint 6-grams) are tested for speech recognition, using training corpus of 1.9 billion characters. The joint n-gram LM achieves performance improvements, especially in recognizing the utterances containing OOV words.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Joint-character-POC N-gram language modeling for Chinese speech recognition

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Joint unsupervised adaptation of n-gram and RNN language models via LDA-based hybrid mixture modeling
Ryo Masumura ... Taichi Asami
-
Ryo Masumura, et. al.Ryo Masumura ... Taichi Asami
01 Dec 2017
01 Dec 2017

Improving N-gram language modeling for code-switching speech recognition
Zhiping Zeng ... Haihua Xu
-
Zhiping Zeng, et. al.Zhiping Zeng ... Haihua Xu
01 Dec 2017
01 Dec 2017

Investigating Bidirectional Recurrent Neural Network Language Models for Speech Recognition
X Chen ... X Liu
-
X Chen, et. al.X Chen ... X Liu
20 Aug 2017
20 Aug 2017

Japanese large-vocabulary continuous-speech recognition using a business-newspaper corpus
T Matsuoka ... K Shirai
-
T Matsuoka, et. al.T Matsuoka ... K Shirai
21 Apr 1997
21 Apr 1997

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Joint-character-POC N-gram language modeling for Chinese speech recognition

Abstract

Talk to us

Similar Papers