Speech Compression for Noise-Corrupted Thai Expressive Speech

Chomphan Chomphan

doi:10.3844/jcssp.2011.1565.1573

Abstract

Problem statement: In speech communication, speech coding aims at preserving the speech quality with lower coding bitrate. When considering the communication environment, various types of noises deteriorates the speech quality. The expressive speech with different speaking styles may cause different speech quality with the same coding method. Approach: This research proposed a study of speech compression for noise-corrupted Thai expressive speech by using two coding methods of CS-ACELP and MP-CELP. The speech material included a hundredmale speech utterances and a hundred female speech utterances. Four speaking styles included enjoyable, sad, angry and reading styles. Five sentences of Thai speech were chosen. Three types of noises were included (train, car and air conditioner). Five levels of each type of noise were varied from 0-20 dB. The subjective test of mean opinion score was exploited in the evaluation process. Results: The experimental results showed that CS-ACELP gave the better speech quality than that of MP-CELP at all three bitrates of 6000, 8600-12600 bps. When considering the levels of noise, the 20-dB noise gave the best speech quality, while 0-dB noise gave the worst speech quality. When considering the speech gender, female speech gave the better results than that of male speech. When considering the types of noise, the air-conditioner noise gave the best speech quality, while the train noise gave the worst speech quality. Conclusion: From the study, it can be seen that coding methods, types of noise, levels of noise, speech gender influence on the coding speech quality.

Highlights

The flexible functionality, this coder employs the multipulse excitation which the number of pulses in fixed
In 1995, Conjugate-Structure Algebraic Code Excited Linear Predictive (CS-ACELP) coding was developed and standardized as ITU G.729 speech coding at the coding rate of 8 kbps
A few years later, MP-Code-Excited Linear Predictive (CELP) speech entry codebook is selective for bitrate scalability and multiple bitrate functionality according to the MPEG-4 CELP speech coder requirements (Ozawa et al, 1996; Chomphan, 2010b) In the MP-CELP speech coder, amplitudes or signs for generating the multi-pulse excitation are vector quantized simultaneously

Summary

INTRODUCTION

The flexible functionality, this coder employs the multipulse excitation which the number of pulses in fixed-. The multimedia applications such as videophone and teleconferencing on ATM and Internet are considerably interested, the high quality speech coders with low bitrates are highly demanded (Chompun et al, 2000). These applications require special considerations for packet loss. To relief this problem, a bitrate-scalable speech coder has been studied where the synthesized speech signal can be decoded from the received packets, which contain only some of the whole encoded packets. The speech coder operates at various bitrates ranging from 4-12 kbps utilizing the flexibility in multi-pulse excitation coding (Chomphan, 2010a).

MATERIALS AND METHODS

RESULTS

CONCLUSION