Abstract

Of 58 papers published so far this year in Journal of Phonetics, 16 (28%) feature Voice Onset Time (VOT) or related measurements, confirming that VOT remains a central concern in the field. However, phoneticians’ VOT measurements generally continue to rely on human judgment, which requires significant labor, makes even large laboratory experiments onerous, and prevents the field from taking full advantage of the millions of hours of digital speech now becoming available. We present an algorithm for accurate automatic measurement of VOT, combining HMM forced alignment for determining approximate stop boundaries with paired burst and voicing onset detectors. Each detector is a frame-level max margin classifier operating on the scale-space projection of a small number of relevant acoustic features. On a large set of clean lab speech, this system has a mean absolute error (relative to human annotation) of only 2.8 ms, with 98% of errors <10 ms. On a subcorpus independently annotated by two of the authors, the system agreed with the two human annotators as well as they agreed with one another (1.49 vs 1.50 ms). Promising results on other datasets will be reported. The system will be released as open-source software.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.