Abstract

This paper is presents a pattern recognition fusion method for isolated Malay digit recognition using Dynamic Time Warping (DTW) and Hidden Markov Model (HMM). The aim of the project is to increase the accuracy percentage of Malay speech recognition. This study proposes an algorithm for pattern recognition fusion of the recognition models. The endpoint detection, framing, normalization, Mel Frequency Cepstral Coefficient (MFCC) and vector quantization techniques are used to process speech samples to accomplish the recognition. Pattern recognition fusion method is then used to combine the results of DTW and HMM which uses weight mean vectors. The algorithm is tested on speech samples that are a part of a Malay corpus. This paper has shown that the fusion technique can be used to fuse the pattern recognition outputs of DTW and HMM. Furthermore it also introduced refinement normalization by using weight mean vector to get better performance with accuracy of 94% on pattern recognition fusion HMM and DTW. Unlikely accuracy for DTW and HMM, which is 80.5% and 90.7% respectively.

Highlights

  • In many speech recognition systems, endpoint detection and pattern recognition are used to detect the presence of speech in a background of noise

  • We have evaluated our algorithm using the data described in the methodology section

  • The recognition algorithms Hidden Markov Model (HMM), Dynamic Time Warping (DTW) and DTW-HMM pattern recognition fusion is tested for the percentage of accuracy

Read more

Summary

Introduction

In many speech recognition systems, endpoint detection and pattern recognition are used to detect the presence of speech in a background of noise. The beginning and end of a word should be detected by the system that processes the word. The problem of detecting the endpoints would seem to be distinguished by human, but it has been found complicated for machine to recognize. Instead in the last three decades, a number of endpoint detection methods have been developed to improve the speed and accuracy of a speech recognition system. Speech Recognition (SR) is a technique aimed at converting a speaker’s spoken utterance into a text string or other applications. SR is still far from a solved problem. It is quoted that the best reported word-error rates on English broadcast news and conversational telephone speech were 10% and 20%, respectively [2]

Objectives
Methods
Results
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call