Gallstone disease is a common condition affecting a substantial number of individuals globally. The risk factors for gallstones include obesity, rapid weight loss, diabetes, and genetic predisposition. Gallstones can lead to serious complications such as calculous cholecystitis, cholangitis, biliary pancreatitis, and an increased risk for gallbladder (GB) cancer. Abdominal ultrasound (US) is the primary diagnostic method due to its affordability and high sensitivity, while computed tomography (CT) and magnetic resonance cholangiopancreatography (MRCP) offer higher sensitivity and specificity. This review assesses the diagnostic accuracy of machine learning (ML) technologies in detecting gallstones. This systematic review followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines for reporting systematic reviews and meta-analyses. An electronic search was conducted in PubMed, Cochrane Library, Scopus, and Embase, covering literature up to April 2024, focusing on human studies, and including all relevant keywords. Various Boolean operators and Medical Subject Heading (MeSH) terms were used. Additionally, reference lists were manually screened. The review included all study designs and performance indicators but excluded studies not involving artificial intelligence (AI)/ML algorithms, non-imaging diagnostic modalities, microscopic images, other diseases, editorials, commentaries, reviews, and studies with incomplete data. Data extraction covered study characteristics, imaging modalities, ML architectures, training/testing/validation, performance metrics, reference standards, and reported advantages and drawbacks of the diagnostic models. The electronic search yielded 1,002 records, of which 34 underwent full-text screening, resulting in the inclusion of seven studies. An additional study identified through citation searching brought the total to eight articles. Most studies employed a retrospective cross-sectional design, except for one prospective study. Imaging modalities included ultrasonography (four studies), computed tomography (three studies), and magnetic resonance cholangiopancreatography (one study). Patient numbers ranged from 60 to 2,386, and image numbers ranged from 60 to 17,560 images included in the training, validation, and testing of the diagnostic models. All studies utilized neural networks, predominantly convolutional neural networks (CNNs). Expert radiologists served as the reference standard for image labelling, and model performances were compared against human doctors or other algorithms. Performance indicators such as sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were commonly used. In conclusion, while the reviewed machine learning models show promising performance in diagnosing gallstones, significant work remains to be done to ensure their reliability and generalizability across diverse clinical settings. The potential for these models to improve diagnostic accuracy and efficiency is evident, but the careful consideration of their limitations and rigorous validation are essential steps toward their successful integration into clinical practice.
Read full abstract