An Investigation into the Voice Energy Level of Pronounced Persian Explosive Consonants by Signal Processing Approach

Hadi Jafari,Mir Mohammad Ettefagh,Peyman Jalali,Reza Hassanejad,Sina Varahram

doi:10.25518/0037-9565.6603

Abstract

A voice is a sound which is created by vibration of the vocal cords, which means that, the vocal cords approach to each other by passing the air through the larynx and followed by different sounds. This vibration is under the influence of the statuses of tongue, teeth and lips, as well as the other effective factors. This vibrating air causes some minor changes around the individual speaker, which is called the voice. Some of the most important procedures in the voice signal analyzing is the signal processing methods in frequency and time-frequency domains; the most conventional approach in frequency domain is the Fast Fourier Transform (FFT) method, on the other hand the time-frequency signal processing methods can identify the frequency components of the signal and extract its time-varying characteristics; moreover they are effective tools for extracting the information of non-stationary signals. Time-frequency analysis methods can be classified into linear and nonlinear categories. The Short-Time Fourier Transform and the Wavelet Transform are the most conventional linear methods. This paper investigates the differences in the energy levels of explosive consonants in voiced mode and unvoiced mode using the most conventional frequency analysis and linear time-frequency analysis methods; these approaches are very effective and profitable in phonetic science and biomedical engineering science in order to diagnosing and determining the severity of laryngeal diseases and brain injuries. Results of this study indicate that the energy level of explosive consonants in voiced mode is more than unvoiced mode; also using Fast Fourier and Wavelet Transforms presents the results in better resolution and better image quality than the short-time Fourier Transform.

Full Text