S172 INTRODUCTION: Scientific study of the role of the clinician in anesthesia patient safety requires the development of objective measures of anesthesiologist performance. Previous studies support the value of real-time task analysis [Anesthesiology 80: 77, 1994 and 87: 144, 1997]. However, these techniques require further validation. The purpose of this study was to examine the intra- and inter-observer reliability of an established task analysis methodology during actual cases and on video recordings of the same cases. METHODS: After IRB approval, routine general endotracheal anesthetics (133 +/- 5 min) performed by experienced certified registered nurse anesthetists were studied. One observer (VW) sat in the OR and used custom software to categorize in real time the activities of each subject into 34 discrete task categories (VW-OR). Concurrently, each case was videotaped using a comparable view. Two weeks later, the same observer performed off-line task analysis from the videotapes (VW-VT). A different observer (MD) also performed task analysis from the videotapes on 2 occasions 2 weeks apart (MD-VT1 & VT2). Case data were segmented into Induction (IND), Maintenance (MNT), and Emergence phases using strict criteria. Data were analyzed for percent and total time on task, task duration, and number of task occurrences per case. Two-way ANOVA followed by Newman-Keuls tests were used to assess significant intra- and inter-observer differences (P<0.01 to account for multiple group comparisons). RESULTS: There were no significant differences in the percent or total time, task durations or occurrences between OR and off-line analyses by the same observer (e.g., P > 0.7-1.0) during any phase of the anesthetics; concordance was greatest during IND. There were no significant differences in percent or total time, or task duration, between observers in the off-line videoanalysis during any phase. For IND and total case, only 2 individual tasks were different (Table 1). However, MD toggled between discrete tasks more rapidly than VT (e.g., occurrences of Observe Monitors during MNT; 140 +/- 28 vs. 84 +/- 14, P<0.01). Additional analyses will be presented.Table 1: Most Common Tasks: Percent of Total CaseDISCUSSION: There was very good intra-observer reproducibility and a concordance between real-time and off-line analyses. As expected, inter-observer variability was greater although the differences generally involved only a few, less common, tasks. The use of strict task definitions and extensive observer training including joint viewing and discussion of videotaped cases appears to contribute to the good inter-rater reliability observed. Differences in style and interpretation between observers must be expected and accounted for in future studies. The ability to scientifically describe clinical performance, especially off-line, will provide a rational basis for the study of training strategies, work schedules, and anesthesia devices, hopefully leading to enhanced safety.