Autosegmental-metrical modeling of speech prosody is principally speaker-oriented. The production of pitch patterns, in systematic lab speech experiments as well as in spontaneous speech corpora, is analyzed in f0 tracings, from which sequences of H(igh) and L(ow) are abstracted. The perceptual relevance of these pitch categories in the transmission from speakers to listeners is largely not conceptualized; thus their modeling in speech communication lacks an essential component. In the metalinguistic task of labeling speech data with the annotation system ToBI, the ‘‘listener’’ plays a subordinate role as well: H and L, being suggestive of signal values, are allocated with reference to f0 curves and little or no concern for perceptual classification by the trained labeler. The seriousness of this theoretical gap in the modeling of speech prosody is demonstrated by experimental data concerning f0-peak alignment. A number of papers in JASA have dealt with this topic from the point of synchronizing f0 with the vocal tract time course in acoustic output. However, perceptual experiments within the Kiel intonation model show that ‘‘early,’’ ‘‘medial’’ and ‘‘late’’ peak alignments need to be defined perceptually and that in doing so microprosodic variation has to be filtered out from the surface signal.