HMM-Based Techniques for Speech Segments Extraction

Waleed H Abdulla

doi:10.1155/2002/819429

Abstract

The goal of the speech segments extraction process is to separate acoustic events of interest (the speech segment to be recognised) in a continuously recorded signal from other parts of the signal (background). The recognition rate of many voice command systems is very much dependent on speech segment extraction accuracy. This paper discusses two novel HMM based techniques that segregate a speech segment from its concurrent background. The first method can be reliably used in clean environments while the second method, which makes use of the wavelets denoising technique, is effective in noisy environments. These methods have been implemented and shown superiority over other popular techniques, thus, indicating that they have the potential to achieve greater levels of accuracy in speech recognition rates.

Highlights

The increasing power of computers, combined with newly developed computational techniques, has contributed to the improvement in the performance of speech recognition systems
A vitally important objective in implementing an isolated words speech recognition engine, commonly used in voice command systems, is the accurate separation of the signal of essence from its background environment. The success of this process has a crucial effect on the overall performance of isolated words automatic speech recognition (ASR) systems
The speech-silence discrimination in a long sentence does not imply any modification of the algorithms over that used in the isolated word situation, and they are processed as if they are the same

Summary

Introduction

The increasing power of computers, combined with newly developed computational techniques, has contributed to the improvement in the performance of speech recognition systems. A vitally important objective in implementing an isolated words speech recognition engine, commonly used in voice command systems, is the accurate separation of the signal of essence from its background environment. The success of this process has a crucial effect on the overall performance of isolated words automatic speech recognition (ASR) systems. It is an issue researchers have tackled since studies were first carried out in this field. In some speech recognition techniques, such as the dynamic time warping technique [27], it is necessary for the incoming spoken utterance to be as free as possible from non-speech regions to avoid such regions causing mismatching be-

Objectives

Methods

Results

Conclusion