The estimation of interobserver agreement in behavioral assessment.

April A Bryington,Marley W Watkins,Darcy J Palmer

doi:10.1037/h0099978

Abstract

Direct observation of behavior has traditionally been a core component of behavioral assessment. However, systematic observational data is not intrinsically reliable and valid. It is well known that observer accuracy and consistency can be influenced by a variety of factors. Therefore, interobserver agreement is frequently used to quantify the psychometric quality of behavioral observations. Two of the commonly used interobserver agreement indices, percentage of agreement and kappa, are reviewed. Although percentage agreement is popular due to its computational simplicity, kappa has been found to be a superior measure because it corrects for chance agreement among observers and allows for multiple observers and categories. A description of kappa and computational methods are presented. ********** Direct observation of behavior has traditionally been a core component of behavioral assessment (Ciminero, 1986; Tryon, 1998). Originally, it was thought unnecessary to establish the reliability and validity of direct observations of behavior since by definition direct observation is free of bias and valid. However, various aspects of methodology can confound the data and therefore lead to invalid results (Hops, Davis, & Longoria, 1995). Kazdin (1977) reviewed research that demonstrated that observer accuracy and reliability can be influenced by variables such as knowledge that accuracy is being checked, drift from original definitions of the observed behavior, the complexity of the coding system being used, and observer expectancies combined with feedback. In addition, Wasik and Loven (1980) reported that characteristics of the recording procedures, characteristics of the observer, and characteristics unique to the observation setting are sources of inaccuracy that can jeopardize the reliability and validity of observational data. Consequently, Cone (1998) suggested that the quality of any observations of behavior must be determined regardless of the procedures used to quantify them. INTEROBSERVER AGREEMENT Researchers have identified procedures that can be used to measure the psychometric properties of data obtained from direct observation (Primavera, Allison, & Alfonso, 1997). The most common of these procedures is interobserver agreement (Skinner, Dittmer, & Howell, 2000). There are diverse opinions of what interobserver agreement actually measures. Hops et al. (1995) defined interobserver agreement as a measure of consistency and, therefore, as representing a form of reliability. In contrast, Alessi (1988) described interobserver agreement as an estimate of objectivity that indicates the degree to which the data reflect the behavior being observed rather than the behavior of the observer. Alessi's definition implies that interobserver agreement is tapping into aspects of validity. Suen (1988, 1990) indicated that interobserver agreement could serve as a measure of both reliability and validity depending upon the degree to which two or more observers agree on occurrences or nonoccurrences, whether a criterion-referenced or norm-referenced orientation is used, and the ratio of random to systematic error. Although there are divergent views about what agreement actually measures, it is generally accepted that it is fundamental to sound behavioral measurement for both researchers and practitioners (Bloom, Fischer, & Orme, 1999; Hayes, Barlow, & Nelson-Gray, 1999; Hoge, 1985; Hops et al., 1995; Kazdin, 2001; Kratochwill, Sheridan, Carlson, & Lasecki, 1999; Maag, 1999; McDermott, 1988; Salvia & Ysseldyke, 2001; Suen, 1988). Assessing Interobserver Agreement Many different methods of calculating interobserver agreement have been proposed (Berk, 1979; Hartmann, 1977; House, House, & Campbell, 1981; Shrout, Spitzer, & Fleiss, 1987). The two most commonly cited methods are percent of agreement and kappa. Overall Percent of Agreement The most frequently used method for determining interobserver agreement is overall percent of agreement (Berk, 1979; Hartmann, 1977; McDermott, 1988). …

Full Text