Abstract

Segmentation into informative regions is an important stage in pre-processing of speech. The quality of segmentation affects the performance of almost all known applications of speech technologies (speech recognition, speaker identification, speech-to-text conversion, etc.). The article presents an improved speech/pause segmentation algorithm. The original algorithm is based on the use of probability density function of background noise, and the analysis of one-dimensional Mahalanobis distance of discrete timing for the investigated speech signal. Modernization consists in the fragmentation of speech and the decomposition of fragments into empirical modes for subsequent analysis of one-dimensional Mahalanobis distance of discrete timing for each mode separately. A study of the modernized algorithm has been carried out in comparison with the original algorithm and the well-known segmentation methods based on the analysis of zero-crossing rate and short-time energy. In accordance with the obtained results of the study, it was concluded that the improved segmentation algorithm provides the best detection of the boundaries of the beginning and the end of informative speech sections with the first and second kind errors, being 4.5767 % and 1.421 %, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call