Abstract

The detection of prosodic events, prosodic stress, and speech segmentation based on prosody have received much attention in the research community in the past decades. Prosody is relevant for both main areas of speech technology, text-to-speech synthesis and automatic speech recognition and understanding, and is exploited increasingly: besides providing redundancy, prosody is recognized to carry information unavailable from other sources and also contributes to the naturalness of the perceived speech. This paper addresses a recently proposed intonation analysis technique, called Weighted Correlation based Atom Decomposition (WCAD). The WCAD approach is inspired by the physiology of speech production and the Fujisaki-model used in speech synthesis, however, it is employed in an analytic, and not in a generative approach: the intonation contour is decomposed into a set of elementary components, called atoms, by a pattern matching algorithm. The obtained atom decomposition is used for prosodic stress detection and automatic phonological phrasing. We compare and also combine the WCAD approach to a phonological approach, which relies on automatic segmentation for phonological phrases using a Gaussian Mixture Model (GMM) / Hidden Markov Model (HMM) model and Viterbi-alignment. Results show comparable performance of the physiologically inspired system to the phonologically conceived one in phonological phrasing for two fixed stress languages of different language families: Hungarian and French. By this we also intend to experimentally confirm that the physiologically inspired WCAD model is able to predict or extract linguistically relevant markers linked to meaning. Finally, a hybrid model is proposed, combining the physiologically and the phonologically inspired approaches, and evaluated in phonological phrase and prosodic stress detection in both languages. The performance of the hybrid model is found to be superior to both individual systems. The basic algorithmic steps targeting feature extraction and atom decomposition, as a whole, are applicable to a wide range of languages. However, linking these to linguistic levels and meaning is by nature language specific, i.e. determining which event refers to which linguistic cue or function cannot be defined without knowing the language.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call