Fully automating LI-RADS on MRI with deep learning-guided lesion segmentation, feature characterization, and score inference.

Ke Wang,Yuehua Liu,Wenjin Yu,Jiayin Zhou,Hongxin Chen,Xiaoying Wang

doi:10.3389/fonc.2023.1153241

Abstract

Leveraging deep learning in the radiology community has great potential and practical significance. To explore the potential of fitting deep learning methods into the current Liver Imaging Reporting and Data System (LI-RADS) system, this paper provides a complete and fully automatic deep learning solution for the LI-RADS system and investigates its model performance in liver lesion segmentation and classification. To achieve this, a deep learning study design process is formulated, including clinical problem formulation, corresponding deep learning task identification, data acquisition, data preprocessing, and algorithm validation. On top of segmentation, a UNet++-based segmentation approach with supervised learning was performed by using 33,078 raw images obtained from 111 patients, which are collected from 2010 to 2017. The key innovation is that the proposed framework introduces one more step called feature characterization before LI-RADS score classification in comparison to prior work. In this step, a feature characterization network with multi-task learning and joint training strategy was proposed, followed by an inference module to generate the final LI-RADS score. Both liver segmentation and feature characterization models were evaluated, and comprehensive statistical analysis was conducted with detailed discussions. Median DICE of liver lesion segmentation was able to achieve 0.879. Based on different thresholds, recall changes within a range of 0.7 to 0.9, and precision always stays high greater than 0.9. Segmentation model performance was also evaluated on the patient level and lesion level, and the evaluation results of (precision, recall) on the patient level were much better at approximately (1, 0.9). Lesion classification was evaluated to have an overall accuracy of 76%, and most mis-classification cases happen in the neighboring categories, which is reasonable since it is naturally difficult to distinguish LI-RADS 4 from LI-RADS 5. In addition to investigating the performance of the proposed model itself, extensive comparison experiment was also conducted. This study shows that our proposed framework with feature characterization greatly improves the diagnostic performance which also validates the effectiveness of the added feature characterization step. Since this step could output the feature characterization results instead of simply generating a final score, it is able to unbox the black-box for the proposed algorithm thus improves the explainability.

Full Text