Abstract

To annotate voice onset time (VOT) of stop consonants in a speech database, manually labeling is a feasible but time-consuming and tedious task. This paper proposed a fully-automatic VOT estimation method to alleviate this burden. The method relies on an HMM-based phone recognizer and a random forest (RF) based onset detector. The phone recognizer performs a forced alignment to locate stop consonants, and the onset detector searches each aligned stop segment for the onsets of burst and voicing. Then the time interval between these onsets is the estimated VOT for that stop consonant. The merit of the proposed method lies in the RF-based onset detector, which is able to provide accurate onset detection with only a small amount of training data. The proposed method was evaluated on the testing set in TIMIT database, which includes 2,344 word-initial and 1,440 word-medial stops. The experimental results revealed that 81.2% of the estimations deviate from the reference values within 10 ms, and 95.7% within 20 ms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call