P35 PURPOSE. Improve the reliability of stroke subtype classification. BACKGROUND. We previously found that the reliability of stroke subtype assignment using the TOAST criteria was only fair. In a two phase process to improve inter-rater agreement, we first developed a computerized algorithm for diagnositic categorization (Neurology 1999; 52: A243). In this second phase, residual variability related to differences in data abstraction was systematically assessed and reduced. METHODS. To assess baseline levels of inter-rater agreement, 4 physicians first retrospectively assigned TOAST subtype diagnoses for a randomly selected series of 14 patients. These diagnoses were then compared with those assigned by the computerized algorithm. Critical disagreements in data abstraction were identified and remaining variability reduced with the development of standardized abstraction procedures. Inter- and intra-rater reliability were reassessed in separate randomly selected groups of cases. RESULTS. There was fair to moderate agreement between the algorithm-based and physician-assigned diagnoses for the baseline cases (kappa, k = 0.41, 95% CI: 0.28, 0.54), reflecting variation in the abstracted data and/or its interpretation. The development of standardized abstraction procedures improved reliability to a moderate level (k=0.54, 95% CI: 0.26, 0.82). Critical disagreements (primarily due to differences in the interpretations of CT and MRI scans, cardiovascular evaluations, and ultrasound results) were identified and abstraction procedures revised accordingly. Reliability then improved to substantial levels of both inter-rater (k=0.68, 95% CI: 0.44, 0.91) and intra-rater (k=0.74, 95% CI 0.61,0.87) agreement. Half of remaining disagreements were due to ambiguities in the medical record and half related to errors in data abstraction. CONCLUSION. TOAST subtype classifications can be reliably assigned using a computerized algorithm with data obtained through standardized medical record abstraction. This residual variability can be addressed by having two raters classify each case and then identifying and resolving the reason(s) disagreements.
Read full abstract