This article illustrates the importance of task context for evaluating the reliability and validity of observer assessments of warm/supportive marital interactions. Multiple observer ratings of 424 families were obtained within and across two observational task situations using the Iowa Family Interaction Rating Scales (Melby et al., 1990). The psychometric properties of observer ratings and family member reports were assessed simultaneously through the use of structural equation modeling. In general, the findings support the reliability of the global assessments of warm/supportive marital behaviors. As hypothesized, a marital discussion task elicited significantly higher levels of spousal warmth than the problem-solving task. Validity, however, varied across the two interactional contexts. These inconsistencies underscore the importance of task context, both for eliciting warm/support behaviors and for evaluating the validity of observational measures of such behaviors. Although it seems intuitively appealing that warmth, support, and demonstrations of positive affect in marital interaction should promote couple satisfaction and harmony, in general the evidence for such an association is inconsistent and far less robust than support for the detrimental influence of hostile or negative marital interactions (Markman & Notarius, 1987). There is a general trend for observer ratings of marital warmth and support, when compared with measures of other observed behaviors (e.g., hostile interactions), to show lower reliability and validity (Hooley & Hahlweg, 1989; Markman, 1991). In this study, we test the hypothesis that the relatively weak association found in earlier studies between positive marital interaction and marital quality, as well as the lower reliability and validity obtained for measures of positive marital interaction, derive in large part from difficulties in adequately measuring these dimensions of spousal behavior. Numerous family researchers have noted that it is not uncommon to find cross-method correspondence among measures from various sources (e.g., among observer-, self-, and spouse-report measures) (Furman, Jones, Buhrmester, & Adler, 1989; Jacob, Tennenbaum, & Krahn, 1987; Kenny, 1991; Moskowitz, 1990). Although a variety of factors may influence degree of correspondence among such measures of spousal behaviors (Howe & Reiss, 1993; Markman, 1990), in this article we specifically examine the effects of the contexts or trigger situations used for the obtained ratings. Given the phenomenal growth in the use of observational methodology (Bennion, 1993) and the increasing recognition of the importance of using observational measures for assessing family processes (Baucom & Sayers, 1989; Hetherington, 1994), assessing the adequacy of various report sources is of particular significance. It is essential, however, that evaluations regarding insider (e.g., self or spouse) and outsider (e.g., trained observer) assessments of behaviors such as warmth and support in marriage be made using procedures that allow direct comparisons among measures obtained by different methods or from different sources. Reliability and validity of such measures should be considered simultaneously in order to ensure their adequacy (Zeller & Carmines, 1980). THE PRESENT STUDY The purpose of this study is to evaluate the reliability and validity of global observer ratings of positive, warm, and supportive marital interactions using the Iowa Family Interaction Rating Scales (Melby et al., 1990). Prior analyses of observational ratings of hostility and coercion obtained using this coding system have demonstrated (a) the reliability and validity of such observational ratings obtained from a marital interaction task (Melby, Conger, Ge, & Warner, 1995) and (b) the ability of either a family problem-solving task or a marital discussion to yield acceptable observational assessments of such behaviors (Melby, Conger, Ge, & Warner, 1994). …