This paper is concerned with the issue of how speech and music prosodies are matched to each other in folksong performance. We consider speech and music to be two more or less autonomous structures which both are required to be accommodated in singing performance according to a certain modus vivendi. Under some conditions, this necessity for coexistence may lead to a conflict between the two media which, as we believe, is caused by the different nature in which speech and music employ the acoustical continua of pitch, duration, and timbre. In speech, contrastive sounds phonemes - are primarily built up by exploiting the differences in sound spectrum (for example, vowels can be distinguished by differences in their lower formant frequencies). In a melody, however, the pitch dimension is of primary importance for its structure. The building-blocks for melodies are the scale steps which result from discretization of the frequency continuum into a number of separate levels. It seems reasonable to suppose that a desirable result of the coexistence of speech and music prosodies in singing would be as perfect match as possible between the two prosodies, unless certain existential« requirements from the side of one medium prevent reaching this. A number of studies about the relationship between the two prosodies in singing may be found in the literature which seem to justify this prediction. In tone languages like Chinese or Japanese, linguistically relevant tone patterns tend to be matched to melodic contours in music (Yung 1983). In languages like Indo-European where contrastive opposition exists between stressed and unstressed syllables, linguistic stress patterns tend to coincide with the metrical structure of stressed and unstressed patterns in music (Palmer & Kelly 1992). On the other hand, a classical example of lack of fit between the two prosodies is singing of high notes by female opera singers where it is physically impossible to keep the vowel formant frequencies at the vicinity of their etalon values from speech, because of the high fundamental frequency (Sundberg 1987). In this paper, we will address the issue of the goodness-of-fit between metrical and word stress in old Estonian folksong repertoire. Estonian is a Finno-Ugric language which belongs to a larger group of the Uralic languages. Stress in Estonian words always falls on the first syllable. An important characteristic of Estonian phonology is use of contrastive duration. There are three contrastive quantity degrees in standard Estonian words, and the difference between them is semantically relevant. The three degrees are known as short, long, and overlong. The difference between long and overlong degrees in most cases is not indicated in written language. In spoken language, differences between the three quantity degrees are manifested in speech by means of the ratio of the duration of the initial syllable to the duration of the second syllable in a word (Lehiste 1968). An important characteristic of the Estonian folksong repertoire is the relative independence of the text and the melody corpora. Almost every text from the former set may be combined with an arbitrary melody from the latter set. This implies that there must exist strong structural constraints which cause such an interplay between texts and melodies to be available to a performer. The necessary structural framework is provided by the metre which in Estonian folksongs is based on the contrast of long and short, rather than stressed and unstressed syllables. The standard so-called