Standard Yorùbá context dependent tone identification using Multi-Class Support Vector Machine (MSVM)

A.A Sosimi,T Adegbola,O.A Fakinlede

doi:10.4314/jasem.v23i5.20

Abstract

Most state-of-the-art large vocabulary continuous speech recognition systems employ context dependent (CD) phone units, however, the CD phone units are not efficient in capturing long-term spectral dependencies of tone in most tone languages. The Standard Yorùbá (SY) is a language composed of syllable with tones and requires different method for the acoustic modeling. In this paper, a context dependent tone acoustic model was developed. Tone unit is assumed as syllables, amplitude magnified difference function (AMDF) was used to derive the utterance wide F contour, followed by automatic syllabification and tri-syllable forced alignment with speech phonetization alignment and syllabification SPPAS tool. For classification of the context dependent (CD) tone, slope and intercept of F values were extracted from each segmented unit. Supervised clustering scheme was utilized to partition CD tri-tone based on category and normalized based on some statistics to derive the acoustic feature vectors. Multi-class support vector machine (MSVM) was used for tri-tone training. From the experimental results, it was observed that the word recognition accuracy obtained from the MSVM tri-tone system based on dynamic programming tone embedded features was comparable with phone features. A best parameter tuning was obtained for 10-fold cross validation and overall accuracy was 97.5678%. In term of word error rate (WER), the MSVM CD tri-tone system outperforms the hidden Markov model tri-phone system with WER of 44.47%.Keywords: Syllabification, Standard Yorùbá, Context Dependent Tone, Tri-tone Recognition

Highlights

In recent times Automatic Speech Recognition (ASR) has been of special interest to researchers; its application domain has expanded from simplest system of digit recognition to portable cross-language spontaneous dialogue systems, such development is mainly due to the improvement in computational power and modeling approaches for representing speech signal
The results shows that the speech recognizer built upon the HMM/SVM segmentation outperforms the one built upon the generalized learning segmentation in terms of word error rate (WER) by about 0.05%, on a noisy data
The Multi-class support vector machine (MSVM) approach to context dependent tone recognition is suitable for the current study

Summary

Introduction

In recent times Automatic Speech Recognition (ASR) has been of special interest to researchers; its application domain has expanded from simplest system of digit recognition to portable cross-language spontaneous dialogue systems, such development is mainly due to the improvement in computational power and modeling approaches for representing speech signal. Tone languages denote a large proportion of the spoken languages of the world and yet lexical tone is an understudied features This is attributed to the unsettled questions on building of the vocabulary, what should constitute the sub-word units, how structures over these units are parameterized, modeled and trained. Several models have been proposed for tone language ASR These techniques can be categorized into two main classes: (i) rule-based and (ii) data-based approach. A drawback of this scheme, is the generation, organization and representation of the interdependency of the rule-set as well as unavailability of domain experts These setbacks inspired the use of the data-driven techniques to ASR (Kumalalo et al, 2010). The number of CD tri-tone are limited reducing model confusability when compared to CD tri-phone which requires a lot of hours of segmented and labelled speech unit. The objective of this paper is to develop a tri-tone acoustic model and explore the use sub-segmental features for SY CD tone identification

Objectives

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Standard Yorùbá context dependent tone identification using Multi-Class Support Vector Machine (MSVM)

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Applied Sciences and Environmental Management

Lead the way for us

Journal: Journal of Applied Sciences and Environmental Management	Publication Date: Jun 18, 2019
License type: cc-by

Similar Papers

Improved acoustic modeling for large vocabulary continuous speech recognition
C.-H Lee ... A.E Rosenberg
Computer Speech & Language | VOL. 6
C.-H Lee, et. al.C.-H Lee ... A.E Rosenberg
01 Apr 1992
Computer Speech & Language | VOL. 6

A comparative study on selecting acoustic modeling units in deep neural networks based large vocabulary Chinese speech recognition
Xiangang Li ... Xihong Wu
Neurocomputing | VOL. 170
Xiangang Li, et. al.Xiangang Li ... Xihong Wu
08 May 2015
Neurocomputing | VOL. 170

Improved lattice rescoring by using speech attributes in Large Vocabulary Continuous Speech Recognition systems
Xinglong Gao ... Jielin Pan
-
Xinglong Gao, et. al.Xinglong Gao ... Jielin Pan
01 Dec 2013
01 Dec 2013

Syllable-based large vocabulary continuous speech recognition
A Ganapathiraju ... M Ordowski
IEEE Transactions on Speech and Audio Processing | VOL. 9
A Ganapathiraju, et. al.A Ganapathiraju ... M Ordowski
01 May 2001
IEEE Transactions on Speech and Audio Processing | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Standard Yorùbá context dependent tone identification using Multi-Class Support Vector Machine (MSVM)

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Applied Sciences and Environmental Management