Abstract

We thank Yuliang Liu and colleagues, and Marc Dewey and Peter Schlattmann for their letters regarding our study on the diagnostic application of deep learning algorithms in head CT imaging.1Chilamkurthy S Ghosh R Tanamala S et al.Deep learning algorithms for detection of critical findings in head CT scans: a retrospective study.Lancet. 2018; 392: 2388-2396Summary Full Text Full Text PDF PubMed Scopus (396) Google Scholar In response to Liu and colleagues, we would like to highlight that a balanced dataset would not have been representative of the population, and therefore using such a dataset for validation was not desired. We compared the performance of our algorithms to original radiology reports (for our qure25k dataset) and to the consensuses of three independent radiologists (for the CQ500 dataset). We therefore believe that the performance of our algorithms was validated against an adequate level of expertise. However, we do acknowledge that error rates cannot completely reflect the effect on the patient. We would like to emphasise that our validation datasets (the machine learning community refers to these as test datasets) were completely independent of training and hyperparameter tuning. Furthermore, the CQ500 dataset was acquired from independent sources to ensure that there was no overfitting to our training data sources. Regarding the comments on reporting disease features and accounting for more than one disease, we agree that studies into these aspects would be useful, but they are beyond the scope of the current work. Dewey and Schlattmann point out the low positive predictive values of the algorithms when applied to our data. Positive predictive value is highly dependent on the prevalence of the target condition. Indeed, the versions of the algorithms that we used had low positive predictive values for the relatively rare target conditions in our study such as midline shifts.1Chilamkurthy S Ghosh R Tanamala S et al.Deep learning algorithms for detection of critical findings in head CT scans: a retrospective study.Lancet. 2018; 392: 2388-2396Summary Full Text Full Text PDF PubMed Scopus (396) Google Scholar However, we disagree that positive predictive value (precision) as a performance metric completely reflects the false positive situation in a clinical workflow. Consider the following scenario: an algorithm for detecting a rare condition (with a prevalence of 0·1%) has a positive predictive value of 20%; then the percentage of false positives seen by the user would be 0·1% × (100 – 20) / 20=0·4%, which is still a small fraction of all the scans. Therefore, the proportion of false positives is perhaps a better metric than precision for establishing the effect on clinical workflow. Nevertheless, we acknowledge that positive predictive value as a metric reflects the user experience and the reliability of the algorithms. We have also realised through our work with artificial intelligence algorithms that specificity, and therefore, areas under the receiver operating characteristic curves (AUCs), as metrics are not fit for model selection for target conditions with low prevalence. Average precision is an interesting alternative to AUC,2Saito T Rehmsmeier M The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets.PLoS One. 2015; 10e0118432Crossref PubMed Scopus (1660) Google Scholar and one which we might include in our future studies. Through this and other novel methods, and by using an increased amount of data, we seek to increase the accuracy of the algorithms in future studies. Both authors are employees of qure.ai. Deep learning algorithms for detection of critical findings in head CT scans: a retrospective studyOur results show that deep learning algorithms can accurately identify head CT scan abnormalities requiring urgent attention, opening up the possibility to use these algorithms to automate the triage process. Full-Text PDF Deep learning and medical diagnosisWe would like to express our appreciation for the well orchestrated study of Sasank Chilamkurthy and colleagues1 and the insights into deep learning for the identification of critical findings in head CT. Their retrospective analysis provides evidence that deep learning has high sensitivity in detecting critical findings in large datasets, and avoids many of the shortcomings of previous, smaller studies.2,3 However, if this deep learning approach were to indicate false positive critical findings, it has the potential to become an obstacle to clinical workflow; conversely, reduced attention might result from negative predictions. Full-Text PDF Deep learning and medical diagnosisSasank Chilamkurthy and colleagues1 used deep learning algorithms to detect diseases on head CT scans. Their system showed accuracy in distinguishing abnormal head CT scans. In relation to their method, we wish to highlight several important factors that should be considered when designing such a system. Full-Text PDF

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.