Introduction Assessing the malignancy of focal liver lesions is an important yet challenging aspect of routine patient care. Contrast-enhanced ultrasound (CEUS) has proved to be a highly reliable tool but is very dependent on the examiner's expertise. The emergence of artificial intelligence has opened doors to algorithms that could potentially aid in the diagnostic process. In this study, we evaluate the performance of a weakly supervised deep learning model in classifying focal liver lesions (FLL) as malignant or benign. Methods Our retrospective feasibility study was based on a cohort of patients from a tertiary care hospital in Germany undergoing routine CEUS examination to evaluate malignancy of FLL. We trained a weakly supervised attention-based multiple instance learning algorithm during 5-fold cross-validation to distinguish malignant from benign liver tumors, without using any manual annotations, only case labels. We aggregated the on-average best performing cross-validation cycle and tested this combined model on a held-out test set. We evaluated its performance using standard performance metrics and developed explainability methods to gain insight into the model's decisions. Results We enrolled 370 patients, comprising a total of 955,938 images extracted from CEUS videos or manually captured during the examination. Our combined model was able to identify malignant lesions with a mean area under the receiver operating curve of 0.844 in the cross-validation experiment and 0.94 (95% CI 0.89 - 0.99) in the held-out test set. The accuracy, sensitivity, specificity, and F1-Score of the combined model in finding malignant lesions in the held-out test, yielded 80.0%, 81.8%, 84.6%, and 0.81, respectively. Our exploratory analysis using visual explainability methods revealed that the model appears to prioritize information that is also highly relevant to expert clinicians in this task. Conclusions Weakly supervised deep learning can classify malignancy in CEUS examinations of FLLs and thus might one day be able to assist doctors' decision-making in clinical routine.
Read full abstract