Abstract

Prosody is used by human listeners to disambiguate spoken language and, in particular, the relative size and location of prosodic phrase boundaries provides an important cue for resolving syntactic ambiguity. Therefore, automatically detected prosodic phrase boundaries should provide information useful in speech understanding for choosing among several candidate parses. Here, we propose two scoring algorithms to rank candidate parses, both based on an analysis/synthesis approach that compares the recognized prosodic phrase structure (analysis) with the predicted structure (synthesis) for each candidate parse. The two scoring algorithms, one rule-based and one using a probabilistic model, yield similar overall results when evaluated in experiments with a corpus of ambiguous sentences read by FM radio announcers. To decouple the performance of the analysis and synthesis components, we have used the scoring algorithms with hand-labeled breaks, which results in disambiguation performance comparable to the performance of human subjects in perceptual experiments. Performance degrades somewhat using automatically recognized breaks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call