Abstract

Many phonetic and phonology domain research papers analyzed segmental duration: what factors and interactions between factors determine their duration. Their results often play an important role in Language Technology applications, for example TTS (text-to-speech synthesis), ASR (automatic speech recognition) widely used in infocommunication. Speech sound duration depends on various factors such as phonetic quality, phonological context, phonological position in the word or in the utterance, speech style, etc. We intended to automatically predict vowel duration in spontaneous speech based on three methods. (i) A classification/regression tree (CART) using some characteristic features of the vowel quality and context. (ii) The same features and feedforward neural network (FFNN) were used to model vowel duration. (iii) In the third method FFNN was used to predict vowel duration using the combination of characteristic features and spectral features. Empirical durational data were obtained by measuring vowel durations as attested in over 110 minutes of a large Hungarian spontaneous speech data base (BEA). Using CART there was a poor correlation (0.57) between measured and predicted vowel duration, with average RMSE (root mean square error) of approximately 33 ms. When using FFNN the results were slightly better: the correlation between the target and predicted vowel duration was 0.62 while RMSE was about 29 ms. When the combined features were used the results were even better: the correlation between the target and predicted vowel duration was 0.79 while RMSE was 25 ms. The results obtained for Hungarian support the complexity of features affecting vowel duration, on the one hand, while on the other they indicate the temporal complexity of segmental level of spontaneous speech, as has already been reported for Lithuanian, Czech, Hindi, Telugu and Korean.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call