Abstract

Abstract—Articulatory event detectors, i.e., detectors of transitions from one articulatory state to another, are formed on based on analysis of spectral–temporal inhomogeneities in a speech signal. A triad such as /pause–fricative–vowel/ is segmented and recognized in the space of the principal components of the response spectrum of the detector of the pause–fricative transition, the spectrum of the fricative at its peak energy, and the response spectrum of the detector of the fricative–vowel transition at this detector’s peak. The root-mean-square error with respect to manual marking for the onset of fricatives is, on average, about 12 ms, and for the moment of the fricative–vowel transition, about 5 ms. Triad recognition errors with the same fricative and different subsequent vowels, as well as triad recognition errors differing only in the presence or absence of vocal excitement, constituted a few percent.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.