Fleiss Kappa Statistic Research Articles

Reconstruction of segmental bone defects with bone transport is a well-established treatment. Mechanical complications at the docking site after frame removal are common. These complications include malunion, non-union, axial deviation and refracture. A simple tool to assess the healing of the docking site is currently lacking. The aim of this study is to evaluate the use of the modified RUST (mRUST) score in the setting of bone transport and to identify factors associated with an increased risk of docking site complications. This retrospective study was conducted at a single tertiary centre in South Africa, included 24 patients with a tibial bone defect treated with bone transport and a circular frame between 2014 and 2023. Demographic data, clinical and bone transport characteristics were recorded. Mechanical complications, such as fracture, non-union, any angulation >5°, shortening >5 mm, or any other complication requiring reoperation, were recorded. The mRUST was adapted as a ratio for the purpose of this study to overcome the common occurrence of cortices being obscured by the frame. The mRUST ratio was applied before and after frame removal for each patient by three appraisers. Comparison between the groups with and without complications was performed regarding bone transport characteristics, docking site configuration and mRUST ratio. The correlation of the score between radiographs before and after frame removal was assessed. The inter-rater reliability of the mRUST was analysed using Fleiss Kappa statistics for each cortex individually and the intraclass correlation coefficient (ICC) for the mRUST ratio. In this study, 20 men and 4 women with a median age of 26 years were included. The overall rate of mechanical complications after frame removal was 21.7%. Complications were all related to the docking site, with two angulations, two fractures and one non-union. Demographics, bone transport characteristics and mRUST ratio before and after frame removal were similar between the two groups. Regarding the configuration of the docking site, an angle of 45° or more between the bone surfaces was associated with the occurrence of mechanical complications (p < 0.001). The correlation of the mean mRUST ratio before and after frame removal showed a moderate relationship, with a Spearman correlation coefficient of 0.50 (p-value 0.13). The inter-rater reliability of the mRUST was "fair" (kappa 0.21-0.40) for the scoring of individual cortices, except for one score which was "slight" (kappa 0.00-0.20). The ICC of the mRUST ratio was 0.662 on radiographs with the frame, and 0.759 after frame removal. This study did not find the mRUST or mRUST ratio useful in assessing the healing of the docking site to decide on the best time to remove the frame. However, a notable finding was that the shape and orientation of the bone ends meeting at the docking site might well be relevant to decrease complication rates. If the angle between the bony surfaces is 45° or more, it may be associated with an increased risk of complications. It may be worthwhile considering reshaping these bone ends at the time of debridement or formal docking procedure to be more collinear, in order to reduce the potential for mechanical complications such as non-union, axial deviation or refracture at the docking site. Kummer A, Nieuwoudt L, Marais LC. Application of the Modified RUST Score in Tibial Bone Transport and Factors Associated with Docking Site Complications. Strategies Trauma Limb Reconstr 2024;19(2):73-81.

Read full abstract

Introduction Heparin induced thrombocytopenia (HIT) is an immune-mediated drug reaction that can cause thromboembolism in the setting of thrombocytopenia following heparin exposure. The “4T's score” has been validated to determine the pre-test probability of HIT and to assist in decision making around ordering testing for HIT. The 4T's scoring system requires an individual clinician to determine each component of the score. The objective of our study is to investigate the inter-rater reliability of calculating the 4T's score among clinicians. Methods Through retrospective query of Northwestern Enterprise Data Warehouse, we identified patients who had a HIT antibody (Ab) test ordered between 10/2019 and 10/2022 after implementation of a clinical decision support (CDS) tool that asked clinicians to calculate a 4T's score as part of HIT PF-4 Ab orders. From this cohort, an independent clinician randomly selected 15 patients. Four raters, including an attending hematologist, a hospital medicine attending, a hematology/oncology fellow, and an internal medicine resident, performed manual chart review of the randomly selected subjects to calculate a 4T's score. Data collected for each case included individual components of the 4T's score and overall 4T's score from each rater. In addition, we compared raters' scores to the 4T's score entered by the ordering clinician for the patient. We then categorized the numerical scores into validated pre-test probability categories: 0-3 as low, 4-5 as intermediate, and ≥6 as high. Inter-rater reliability for the categories was calculated using the Fleiss kappa statistic. Our study was approved by the Northwestern University Institutional Review Board. Results Of 15 cases selected, 5 each were scored as low, intermediate, and high probability 4T's scores by the ordering clinician (associated with the HIT Ab order). The overall agreement between the score categories for the 5 clinicians (4 raters and the ordering clinician) was 50.7%, with a Fleiss Kappa statistic of 0.26 (95% CI [0.07-0.45]), indicating poor inter-rater reliability (Figure 1). Excluding the ordering clinician, the overall agreement in score category between the 4 raters was 58.9%, with a Fleiss Kappa statistic of 0.38 (95% CI [0.14-0.62]). The same 4T's score pre-test probability category was calculated by the 4 raters in 5 cases, with only one case in which all four raters calculated the same numerical 4T's score. There was one case in which all 5 clinicians calculated the same 4T's score probability category, but none with the same numerical 4T's score. The hematology/oncology fellow had highest inter-rater agreement with the original clinical (Kappa 0.30 [-0.09-0.69]), whereas the internal medicine attending and resident had the lowest (both with Kappa 0.00 [-0.37-0.37]). Of the 15 4T's scores, 12/15 ordered by the hematology attending, 5/15 by the internal medicine attending, 12/15 by the hematology/oncology fellow, and 11/15 by the internal medicine resident were lower than those calculated by the original ordering clinicians. Conclusion Our study demonstrates poor inter-rater reliability of HIT 4T's score calculation, across levels of training and specialty. Importantly, poor inter-rater reliability was seen across 4T's categories, which has implications for clinical management of patients undergoing evaluation for HIT. This suggests that different strategies are necessary to help clinicians better use the 4Ts score.

Read full abstract

Fleiss Kappa Statistic Research Articles

Related Topics

Articles published on Fleiss Kappa Statistic

Is ChatGPT a reliable tool in Autoimmune Hepatitis?

Assessment of Inter-observer Reproducibility of the Residual Cancer Burden Index and Neoadjuvant Chemotherapy Response in Breast Carcinoma

Kinetics of EBV antibody-based NPC risk scores in Taiwan NPC multiplex families.

Predicting pedestrian-vehicle interaction severity at unsignalized intersections

An inter-assessor reliability study on the categorization and staging of pressure injuries

Application of the Modified RUST Score in Tibial Bone Transport and Factors Associated with Docking Site Complications.

Trauma-related preventable death; data analysis and panel review at a level 1 trauma centre in Amsterdam, the Netherlands.

Are Generative Pretrained Transformer 4 Responses to Developmental Dysplasia of the Hip Clinical Scenarios Universal? An International Review.

Diagnosis of Unruptured Intracranial Aneurysms Using Proton-Density Magnetic Resonance Angiography: A Comparison With High-Resolution Time-of-Flight Magnetic Resonance Angiography.

Low Inter-Rater Reliability of Calculating the 4T's Score for Heparin Induced Thrombocytopenia

Assessing clinical reasoning in the OSCE: pilot-testing a novel oral debrief exercise

Assessment of the Tumor-Stroma Ratio and Tumor-Infiltrating Lymphocytes in Colorectal Cancer: Inter-Observer Agreement Evaluation.

Interrater Agreement of CT Grading of Blunt Splenic Injuries: Does the AAST Grading Need to Be Reimagined?

Predictor of failed cancer resuscitation by expert agreement

Grade Group accuracy is improved by extensive prostate biopsy sampling, but unrelated to prostatectomy specimen sampling or use of immunohistochemistry.

Reliability assessment of the 2018 classification case definitions of peri-implant health, peri-implant mucositis, and peri-implantitis.

Multilevel calibration procedure for the oral health national multicenter survey in primary teeth.

Impact of postoperative baseline MRI on diagnostic confidence and performance in detecting local recurrence of soft-tissue sarcoma of the limb

Clinical utility of intraoperative Arterial Spin Labeling for resection control in brain tumor surgery at 3 T.

Accuracy and reliability of tele-ultrasonography in detecting gastrointestinal obstruction in dogs and cats.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Fleiss Kappa Statistic Research Articles

Related Topics

Articles published on Fleiss Kappa Statistic

Is ChatGPT a reliable tool in Autoimmune Hepatitis?

Assessment of Inter-observer Reproducibility of the Residual Cancer Burden Index and Neoadjuvant Chemotherapy Response in Breast Carcinoma

Kinetics of EBV antibody-based NPC risk scores in Taiwan NPC multiplex families.

Predicting pedestrian-vehicle interaction severity at unsignalized intersections

An inter-assessor reliability study on the categorization and staging of pressure injuries

Application of the Modified RUST Score in Tibial Bone Transport and Factors Associated with Docking Site Complications.

Trauma-related preventable death; data analysis and panel review at a level 1 trauma centre in Amsterdam, the Netherlands.

Are Generative Pretrained Transformer 4 Responses to Developmental Dysplasia of the Hip Clinical Scenarios Universal? An International Review.

Diagnosis of Unruptured Intracranial Aneurysms Using Proton-Density Magnetic Resonance Angiography: A Comparison With High-Resolution Time-of-Flight Magnetic Resonance Angiography.

Low Inter-Rater Reliability of Calculating the 4T's Score for Heparin Induced Thrombocytopenia

Assessing clinical reasoning in the OSCE: pilot-testing a novel oral debrief exercise

Assessment of the Tumor-Stroma Ratio and Tumor-Infiltrating Lymphocytes in Colorectal Cancer: Inter-Observer Agreement Evaluation.

Interrater Agreement of CT Grading of Blunt Splenic Injuries: Does the AAST Grading Need to Be Reimagined?

Predictor of failed cancer resuscitation by expert agreement

Grade Group accuracy is improved by extensive prostate biopsy sampling, but unrelated to prostatectomy specimen sampling or use of immunohistochemistry.

Reliability assessment of the 2018 classification case definitions of peri-implant health, peri-implant mucositis, and peri-implantitis.

Multilevel calibration procedure for the oral health national multicenter survey in primary teeth.

Impact of postoperative baseline MRI on diagnostic confidence and performance in detecting local recurrence of soft-tissue sarcoma of the limb

Clinical utility of intraoperative Arterial Spin Labeling for resection control in brain tumor surgery at 3 T.

Accuracy and reliability of tele-ultrasonography in detecting gastrointestinal obstruction in dogs and cats.