Hierarchical Phoneme Classification for Improved Speech Recognition

Donghoon Oh,Gil-Jin Jang,Jeong-Sik Park,Ji-Hwan Kim

doi:10.3390/app11010428

Abstract

Speech recognition consists of converting input sound into a sequence of phonemes, then finding text for the input using language models. Therefore, phoneme classification performance is a critical factor for the successful implementation of a speech recognition system. However, correctly distinguishing phonemes with similar characteristics is still a challenging problem even for state-of-the-art classification methods, and the classification errors are hard to be recovered in the subsequent language processing steps. This paper proposes a hierarchical phoneme clustering method to exploit more suitable recognition models to different phonemes. The phonemes of the TIMIT database are carefully analyzed using a confusion matrix from a baseline speech recognition model. Using automatic phoneme clustering results, a set of phoneme classification models optimized for the generated phoneme groups is constructed and integrated into a hierarchical phoneme classification method. According to the results of a number of phoneme classification experiments, the proposed hierarchical phoneme group models improved performance over the baseline by 3%, 2.1%, 6.0%, and 2.2% for fricative, affricate, stop, and nasal sounds, respectively. The average accuracy was 69.5% and 71.7% for the baseline and proposed hierarchical models, showing a 2.2% overall improvement.

Highlights

These days, automatic speech recognition (ASR) performance has improved greatly by using deep neural networks [1,2,3,4,5]
Phonetic transcripts for all sentences are provided in the TIMIT corpus distribution
We proposed a hierarchical speech recognition model based on phoneme clustering

Summary

Introduction

These days, automatic speech recognition (ASR) performance has improved greatly by using deep neural networks [1,2,3,4,5]. Choosing feature extraction methods and acoustic model types appropriately for confusing phonemes can help improve the final sentence recognition performance. We propose a novel method of applying phoneme-specific acoustic models for automatic speech recognition by a hierarchical phoneme classification framework. The hierarchical phoneme classification is composed of a single, baseline phoneme classifier, clustering into similar groups and final result generation using retrained groupspecific models. ‘d’ and ‘t’ sounds in different words ‘dean’ and ‘teen’, respectively, or ‘b’ and ‘p’ sounds in ‘bad’ and ‘pad’ Those consonants can be distinguished by the existence of the glottal pulse that occurs at periodic time intervals [18,19,20], and we use autocorrelation functions to add the periodicity feature of the phoneme sound if the found phoneme falls into consonant categories.

Phoneme Clustering

Phonemes

Baseline Phoneme Recognition with TIMIT Dataset

Confusion Matrix

Phoneme Clustering Using Confusion Matrix

Hierarchical Phoneme Classification

Overall Architecture

Vowels and Mixed Phoneme Classification

Varying Analysis Window Sizes for Consonants

Voiced and Unvoiced Consonants Classification

TIMIT Database

Various Window Sizes

Phoneme Group Model Training

Performance of the Hierarchical Classification

Discussion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Jan 4, 2021
Citations: 12	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Hierarchical Phoneme Classification for Improved Speech Recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Rule-Based Embedded HMMs Phoneme Classification to Improve Qur’anic Recitation Recognition
Ammar Mohammed Ali Alqadasi ... Md Sah Hj Salam
Electronics | VOL. 12
Ammar Mohammed Ali Alqadasi, et. al.Ammar Mohammed Ali Alqadasi ... Md Sah Hj Salam
30 Dec 2022
Electronics | VOL. 12

Hierarchical Text Classification Methods and Their Specification
Aixin Sun ... Ee-Peng Lim
-
Aixin Sun, et. al.Aixin Sun ... Ee-Peng Lim
01 Jan 2003
01 Jan 2003

Exploring recurrent neural network based acoustic and linguistic modeling for children's speech recognition
Sreeram Ganji ... Rohit Sinha
-
Sreeram Ganji, et. al.Sreeram Ganji ... Rohit Sinha
01 Nov 2017
01 Nov 2017

Construction of Language Models for Uzbek Language
N.S. Mamatov ... B.N. Samijonov
-
N.S. Mamatov, et. al.N.S. Mamatov ... B.N. Samijonov
28 Sep 2022
28 Sep 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Hierarchical Phoneme Classification for Improved Speech Recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences