Abstract

A three‐way classifier is introduced, which reliably differentiates between voiced, unvoiced, and silence segments of a speech utterance. The extreme points of the median smoothed first difference of the output of this classifier are used for the phoneme boundary detection. The classification criterion is calculated by weighting the log of the ratio of high‐ and low‐pass filtered versions of the speech utterance by clipped versions of a normalized root average energy and a normalized zero crossing rate. The high‐ and low‐pass filtering operations were simply performed by calculating the sample difference signal and the sample addition signal. The frequency domain attenuation characteristics corresponding to these simple filters are those of a quarter period of a cosine and sine waves, respectively. These filters were found to be completely sufficient for the purpose of classification. The phoneme boundary detector can be used for the mechanized acquisition of the phoneme inventory of any language and, due to the simple operations involved, in automatic speech recognition.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.