Abstract

I was interested to read the recent paper by Ippolito et al (2017), published in this journal. The purpose of the authors was to compare the diagnostic value of whole-body ultra low-dose computed tomography (WBULDCT) with that of Spinal Magnetic Resonance Imaging (SMRI) for the identification of spinal bone marrow involvement in patients with Multiple Myeloma (MM) (Ippolito et al, 2017). Based on their results, the overall concordance between WBULDCT and SMRI in lesion detection was 76·7%, detecting or excluding involvement of the axial skeleton in 25 and 35 out of the 35 cases studied, while in 2/35 patients WBULDCT and SMRI were discordant in terms of axial skeleton involvement. The concordance in spinal distribution of lesions was 61·6% on cervical, 71·5% on dorsal, 86·4% on lumbar and 94·4% on sacral, while for the pattern of disease, it was 56·1% for the focal and 88·7% for the combined pattern. Cohen's kappa index was 0·85 (P < 0·001), assessing an excellent agreement (Ippolito et al, 2017). I should like to raise 3–4 methodological and statistical points regarding diagnostic value. Diagnostic value is not just assessing agreement. It should be considered as diagnostic accuracy (validity) and diagnostic precision (reliability, agreement). Therefore, the first methodological point is accuracy, which should be assessed applying well known statistical estimates. To assess validity, for qualitative variables, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), likelihood ratio positive and likelihood ratio negative as well as diagnostic accuracy and odds ratio (ratio of true to false results) are among the most appropriate tests. However, for clinical purposes, it is crucial to know that reporting the diagnostic-added value of a test when using receiver-operating characteristics (ROC) should be considered, because even well known validity estimates can be perfect, while diagnostic-added value may be clinically negligible. Reliability (precision or agreement) as a different methodological issue of diagnostic value should be assessed using an appropriate test. For qualitative variables, weighted kappa should be applied (Szklo & Nieto, 2007; Sabour, 2016, 2017a, 2017b; Sabour & Ghassemi, 2016). It is crucial to know that, using Cohen's kappa is not the most appropriate test to assess agreement. The K statistic has two important weaknesses: It depends upon the prevalence in each category which means it can be possible to have different kappa values having the same percentage for both concordant and discordant cells! In both situations given in Table 1 (A and B), the prevalence of concordant cells is 90% and that of discordant cells is 10%; however, these result in different kappa values (0·44 as moderate and 0·80 as very good, respectively). The kappa value also depends upon the number of categories (Szklo & Nieto, 2007; Sabour, 2016, 2017a, 2017b; Sabour & Ghassemi, 2016). Therefore, using weighted kappa in such situations can be suggested as an appropriate statistical test. One more important thing is our approach to assess validity is global average; however, an individual based approach should be considered to correctly assess reliability. Ippolito et al (2017) concluded that WBULDCT represents a useful diagnostic tool in the detection of spinal involvement of MM patients, offering detailed information about extra-axial involvement, which could be potentially missed with dedicated SMRI. Here, I have discussed the limitations of their approach and the lack of evidence for such a conclusion. The authors' conclusion should be supported by the above-mentioned statistical and methodological issues. Otherwise, misdiagnosis and mismanagement of the patients cannot be avoided. None The author declares that they have no competing interests. The authors have nothing to disclose.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call