Selection of acoustic modeling unit for Tibetan speech recognition based on deep learning

Baojia Gong,Yuntao Ding,Zhijie Cai,Maozhaxi Peng,Rangzhuoma Cai,I Barukčić

doi:10.1051/matecconf/202133606014

Abstract

The selection of the speech recognition modeling unit is the primary problem of acoustic modeling in speech recognition, and different acoustic modeling units will directly affect the overall performance of speech recognition. This paper designs the Tibetan character segmentation and labeling model and algorithm flow for the purpose of solving the problem of selecting the acoustic modeling unit in Tibetan speech recognition by studying and analyzing the deficiencies of the existing acoustic modeling units in Tibetan speech recognition. After experimental verification, the Tibetan character segmentation and labeling model and algorithm achieved good performance of character segmentation and labeling, and the accuracy of Tibetan character segmentation and labeling reached 99.98%, respectively.

Highlights

Automatic speech recognition technology is a key technology for human-computer interaction
The Tibetan character segmentation and labeling model in this article is mainly composed of preprocessing, segmentation and labeling modules
This paper proposes the algorithm flow of Tibetan character segmentation and labeling by designing the Tibetan character segmentation and labeling model

Summary

Introduction

Automatic speech recognition technology is a key technology for human-computer interaction. In Tibetan speech recognition system, researchers have considered modeling units with different granularity, including words and syllables [4], vowels [5,6,7,8] and phonemes [9,10,11], respectively. If words or syllables are used as modeling units, the requirements of the corpus are too high and can lead to data sparsity. In order to solve the above problems, the paper proposes a method of using Tibetan character as the modeling unit, and presents the flow of its segmentation and labeling algorithm. Tibetan character is defined as all single characters and Tibetan stacked combination symbols including the base character, head letter, subjoined letter, and vowel. The Tibetan syllable བ ིགས in བ, ི, ག and ས are each one character, and the syllable is composed of four characters

Tibetan character segmentation and labeling model

Tibetan character segmentation and labeling algorithm flow

Experiments

Experiment and result analysis

Summary

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: MATEC Web of Conferences	Publication Date: Jan 1, 2021
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Selection of acoustic modeling unit for Tibetan speech recognition based on deep learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: MATEC Web of Conferences

Lead the way for us

Similar Papers

Acoustic Modeling in Speech Recognition: A Systematic Review
Shobha Bhatt ... Anurag Jain
International Journal of Advanced Computer Science and Applications | VOL. 11
Shobha Bhatt, et. al.Shobha Bhatt ... Anurag Jain
01 Jan 2020
International Journal of Advanced Computer Science and Applications | VOL. 11

Using different acoustic, lexical and language modeling units for ASR of an under-resourced language – Amharic
Martha Yifiru Tachbelie ... Laurent Besacier
Speech Communication | VOL. 56
Martha Yifiru Tachbelie, et. al.Martha Yifiru Tachbelie ... Laurent Besacier
14 Feb 2013
Speech Communication | VOL. 56

Graph-based semi-supervised acoustic modeling in DNN-based speech recognition
Yuzong Liu ... Katrin Kirchhoff
-
Yuzong Liu, et. al.Yuzong Liu ... Katrin Kirchhoff
01 Dec 2014
01 Dec 2014

Graph-Based Semisupervised Learning for Acoustic Modeling in Automatic Speech Recognition
Yuzong Liu ... Katrin Kirchhoff
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 24
Yuzong Liu, et. al.Yuzong Liu ... Katrin Kirchhoff
01 Nov 2016
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 24

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Selection of acoustic modeling unit for Tibetan speech recognition based on deep learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: MATEC Web of Conferences