Abstract

Endpoint detection, which means distinguishing speech and non- speech segments, is considered as one of the key preprocessing operations in automatic speech recognition (ASR) systems. Usually the energy of speech signal and Zero Crossing Rate (ZCR), are used to locate the beginning and ending for an utterance. Both of these methods have been shown to be effective for endpoint detection. However, especially in a high noise environment they fail. In this paper, we integrate the modified Teager approach with the Energy-Entropy Features. In our new algorithm, the Teager Energy is used to determine crude endpoints, and the Energy-Entropy Features are used to make the final decision. The advantage of this method is that there is no need to estimate the background noise. Therefore, it is very helpful for environments when the beginning or ending noise is very strong or there is not enough “silence” at the beginning or at the end of the utterance. Experimental results on Farsi speech show that the accuracy of this algorithm is quite satisfactory and acceptable for speech endpoints detection.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.