Abstract
Speech endpoint detection is one of the key problems in the practical application of speech recognition system. In this paper, speech signal contained chirp is decomposed into several intrinsic mode function (IMF) with the method of ensemble empirical mode decomposition (EEMD). At the same time, it eliminates the modal mix superposition phenomenon which usually comes out in processing speech signal with the algorithm of empirical mode decomposition (EMD). After that, selects IMFs contained major noise through the adaptive algorithm. Finally, the IMFs and speech signal contained chirp are input into the independent component analysis (ICA) and pure voice signal is separated out. The accuracy of speech endpoint detection can be improved in this way. The result shows that the new speech endpoint detection method proposed above is effective, and has strong anti-noises ability, especially suitable for the speech endpoint detection in low SNR.
Highlights
The speech endpoint detection has great significance in speech signal processing
There are a great number of speech endpoint detection methods, such as Short-time Energy, Short-time Zero-crossing Rate, Information Entropy, Mel-Frequency Cepstrum Coefficient (MFCC), Hidden Markov Models (HMM), Wavelet Transform technology
Since the above methods can not detect speech signals accurately at low signal-to-noise ratio (SNR), in this paper, we provide a method of endpoint detection which based on ensemble empirical mode decomposition (EEMD)[4]
Summary
The speech endpoint detection has great significance in speech signal processing. Accurate speech endpoint detection can improve the accuracy of speech recognition, and reduce the quantity of computational data. There are a great number of speech endpoint detection methods, such as Short-time Energy, Short-time Zero-crossing Rate, Information Entropy, Mel-Frequency Cepstrum Coefficient (MFCC), Hidden Markov Models (HMM), Wavelet Transform technology. These methods still have some defects, especially in low signal-to-noise ratio (SNR) conditions. Since the above methods can not detect speech signals accurately at low SNR, in this paper, we provide a method of endpoint detection which based on ensemble empirical mode decomposition (EEMD)[4]. The decomposition of the data has real physical meanings, and has a higher time-frequency resolution This analysis method will be a great breakthrough in analyzing non-stationary and nonlinear speech signal
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have