Abstract
This paper presents a voiced/unvoiced classification algorithm of the noisy speech signal by analyzing two acoustic features of the speech signal. Short-time energy and short-time zero- crossing rates are one of the most distinguishable time domain features of a speech signal to classify its voiced activity into voiced/unvoiced segment. A new idea is developed where frame by frame processing has done in narrow band speech signal using spectrogram image. Two time domain features, short-time energy (STE) and short-time zero-crossing rate (ZCR) are used to classify its voiced/unvoiced parts. In the first stage, each frame of the analyzing spectrogram is divided into three separate sub bands and examines their short-time energy ratio pattern. Then an energy ratio pattern matching look up table is used to classify the voicing activity. However, this method successfully classifies patterns 1 through 4 but fails in the rest of the patterns in the look up table. Therefore, the rest of the patterns are confirmed in the second stage where frame wise short-time average zero- crossing rate is compared with a threshold value. In this study, the threshold value is calculated from the short-time average zero-crossing rate of White Gaussian Noise (wGn). The accuracy of the proposed method is evaluated using both male and female speech waveforms under different signal-to-noise ratios (SNRs). Experimental results show that the proposed method achieves better accuracy than the conventional methods in the literature.
Highlights
Speech processing is an interesting area of signal processing where Voiced/Unvoiced classification is one of the classic problems
We have proposed an approach for speech classification using short-time sub band energy features of spectrogram images of speech signals
Classification decision was taken based on their pattern using an energy ratio pattern matching lookup table
Summary
Speech processing is an interesting area of signal processing where Voiced/Unvoiced classification is one of the classic problems. Considerable efforts have been spent by the researchers in recent years, but results are still not quite satisfactory in case of noisy environments. Speech has several fundamental characteristics in both time-domain and frequency-domain. In Time-domain, speech signal features are short-time energy, short-time zero-crossing rate, and short-time autocorrelation. Speech can be divided into several voiced and unvoiced regions. Short-time energy and short-time zero-crossing rate are most important features to detect voiced and unvoiced speech in both noisy and noiseless environment. Numerous speech processing applications like speech synthesis, speech enhancement, and speech recognitions are highly dependent on the successful segmentation of speech signal into voiced, unvoiced region
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Science Journal of Circuits, Systems and Signal Processing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.