Diagnostic accuracy of artificial intelligence models for temporomandibular joint anomalies on MRI: a systematic review and meta-analysis.
Artificial intelligence (AI) techniques are increasingly applied to magnetic resonance imaging (MRI) for detecting temporomandibular joint (TMJ) anomalies; however, their overall diagnostic accuracy and generalizability remain uncertain. To systematically review and meta-analyse the diagnostic performance of AI models for TMJ anomaly detection on MRI and to identify factors influencing model performance. A comprehensive search of PubMed, Scopus, Embase, and Web of Science was conducted for studies published between January 2015 and September 2025. Two reviewers independently screened and extracted data. Eligible studies developed and tested AI, machine learning, or deep learning models on human TMJ MRI and reported quantitative performance metrics. Risk of bias was assessed using the QUADAS-2 tool. Pooled sensitivity and specificity were estimated using a bivariate random-effects model, while pooled accuracy was derived using logit transformation. Heterogeneity (I2) was explored through subgroup analyses by model architecture and validation strategy. Fourteen studies were included in the systematic review, of which six met the criteria for meta-analysis. Across these six studies, 18 models were analyzed for accuracy, 29 for sensitivity, and 24 for specificity. The pooled diagnostic accuracy was 0.487 (95% CI 0.403-0.571), with pooled sensitivity and specificity of 0.399 (95% CI 0.348-0.450) and 0.399 (95% CI 0.343-0.456), respectively, all showing substantial heterogeneity (I2 > 90%). Subgroup analyses indicated that advanced architectures such as ResNet-18, Inception v3, and EfficientNet-b4 achieved higher and more consistent diagnostic performance. Advanced deep learning architectures such as ResNet-18, Inception v3, and EfficientNet-b4 demonstrated superior diagnostic performance for detecting temporomandibular joint anomalies on MRI. These findings highlight the potential of AI-assisted MRI interpretation to improve diagnostic consistency, efficiency, and early detection of TMJ pathology. However, substantial heterogeneity and limited external validation currently limit clinical translation. Standardized multicenter studies and transparent model validation are essential to ensure reliable integration of AI tools into clinical TMJ imaging workflows.
- Research Article
1
- 10.1002/hed.70000
- Aug 4, 2025
- Head & neck
This systematic review and meta-analysis evaluated the performance of imaging-based artificial intelligence (AI) models in diagnosing preoperative cervical lymph node metastasis (LNM) in clinically node-negative (cN0) papillary thyroid carcinoma (PTC). We conducted a literature search in PubMed, Embase, and Web of Science until February 25, 2025. Studies were selected that focused on imaging-based AI models for predicting cervical LNM in cN0 PTC. The diagnostic performance metrics were analyzed using a bivariate random-effects model, and study quality was assessed with the QUADAS-2 tool. From 671 articles, 11 studies involving 3366 patients were included. Ultrasound (US)-based AI models showed pooled sensitivity of 0.79 and specificity of 0.82, significantly higher than radiologists (p < 0.001). CT-based AI models demonstrated sensitivity of 0.78 and specificity of 0.89. Imaging-based AI models, particularly US-based AI, show promising diagnostic performance. There is a need for further multicenter prospective studies for validation. PROSPERO: (CRD420251063416).
- Research Article
- 10.1002/micr.70143
- Nov 13, 2025
- Microsurgery
To systematically evaluate the diagnostic performance of artificial intelligence (AI) models in predicting postoperative complications following flap surgery, and to compare the efficacy of different input modalities used in model training. A comprehensive literature search was conducted across PubMed, Embase, Scopus, and Web of Science to identify studies utilizing AI for flap monitoring and postoperative complication prediction. A total of 12 studies comprising 18,520 patients and 32,148 input data points were included. Pooled sensitivity, specificity, likelihood ratios, and SROC curves were calculated using a bivariate random-effects model. The meta-analysis revealed a pooled sensitivity of 78.0% [95% CI: 0.54-0.91] and a pooled specificity of 88.0% [95% CI: 0.76-0.94]. The positive and negative likelihood ratios were 6.36 [95% CI: 2.54-15.91] and 0.25 [95% CI: 0.10-0.64], respectively. The area under the SROC curve was 0.91 [95% CI: 0.88-0.93], indicating excellent overall diagnostic performance. AI models, particularly those incorporating photographic data and deep learning models, demonstrate high diagnostic accuracy and hold promise as adjunct tools for postoperative flap monitoring.
- Research Article
- 10.54364/aaiml.2024.43159
- Jan 1, 2024
- Advances in Artificial Intelligence and Machine Learning
Introduction The accurate prediction of mandibular bone growth is crucial in orthodontics and maxillofacial surgery, impacting treatment planning and patient outcomes. Traditional methods often fall short due to their reliance on linear models and clinician expertise, which are prone to human error and variability. Artificial intelligence (AI) and machine learning (ML) offer advanced alternatives, capable of processing complex datasets to provide more accurate predictions. This systematic review examines the efficacy of AI and ML models in predicting mandibular growth compared to traditional methods. Method. A systematic review was conducted following the PRISMA guidelines, focusing on studies published up to July 2024. Databases searched included PubMed, Embase, Scopus, and Web of Science. Studies were selected based on their use of AI and ML algorithms for predicting mandibular growth. A total of 31 studies were identified, with 6 meeting the inclusion criteria. Data were extracted on study characteristics, AI models used, and prediction accuracy. The risk of bias was assessed using the QUADAS-2 tool. Results. The review found that AI and ML models generally provided high accuracy in predicting mandibular growth. For instance, the LASSO model achieved an average error of 1.41 mm for predicting skeletal landmarks. However, not all AI models outperformed traditional methods; in some cases, deep learning models were less accurate than conventional growth prediction models. Discussion. The variability in datasets and study designs across the included studies posed challenges for comparing AI models’ effectiveness. Additionally, the complexity of AI models may limit their clinical applicability. Despite these challenges, AI and ML show significant promise in enhancing predictive accuracy for mandibular growth. Conclusion. AI and ML models have the potential to revolutionize mandibular growth prediction, offering greater accuracy and reliability than traditional methods. However, further research is needed to standardize methodologies, expand datasets, and improve model interpretability for clinical integration.
- Research Article
11
- 10.1177/20552076251330528
- Mar 1, 2025
- Digital health
Artificial Intelligence (AI) has demonstrated significant potential in transforming psychiatric care by enhancing diagnostic accuracy and therapeutic interventions. Psychiatry faces challenges like overlapping symptoms, subjective diagnostic methods, and personalized treatment requirements. AI, with its advanced data-processing capabilities, offers innovative solutions to these complexities. This study systematically reviewed and meta-analyzed the existing literature to evaluate AI's diagnostic accuracy and therapeutic efficacy in psychiatric care, focusing on various psychiatric disorders and AI technologies. Adhering to PRISMA guidelines, the study included a comprehensive literature search across multiple databases. Empirical studies investigating AI applications in psychiatry, such as machine learning (ML), deep learning (DL), and hybrid models, were selected based on predefined inclusion criteria. The outcomes of interest were diagnostic accuracy and therapeutic efficacy. Statistical analysis employed fixed- and random-effects models, with subgroup and sensitivity analyses exploring the impact of AI methodologies and study designs. A total of 14 studies met the inclusion criteria, representing diverse AI applications in diagnosing and treating psychiatric disorders. The pooled diagnostic accuracy was 85% (95% CI: 80%-87%), with ML models achieving the highest accuracy, followed by hybrid and DL models. For therapeutic efficacy, the pooled effect size was 84% (95% CI: 82%-86%), with ML excelling in personalized treatment plans and symptom tracking. Moderate heterogeneity was observed, reflecting variability in study designs and populations. The risk of bias assessment indicated high methodological rigor in most studies, though challenges like algorithmic biases and data quality remain. AI demonstrates robust diagnostic and therapeutic capabilities in psychiatry, offering a data-driven approach to personalized mental healthcare. Future research should address ethical concerns, standardize methodologies, and explore underrepresented populations to maximize AI's transformative potential in mental health.
- Research Article
4
- 10.1002/cre2.70115
- Feb 1, 2025
- Clinical and experimental dental research
Given the complexity of temporomandibular joint disorders (TMDs) and their overlapping symptoms with other conditions, an accurate diagnosis necessitates a thorough examination, which can be time-consuming and resource-intensive. Consequently, innovative diagnostic tools are required to increase TMD diagnosis efficiency and precision. Therefore, the purpose of this umbrella review was to examine the existing evidence about the usefulness of artificial intelligence (AI) in TMD diagnosis. A comprehensive search of the literature was performed from inception to November 30, 2024, in PubMed-MEDLINE, Embase, and Scopus databases. This review evaluated systematic reviews (SRs) and meta-analyses (MAs) that reported TMD patients/datasets, any AI model as intervention, no treatment, placebo as comparator and accuracy, sensitivity, specificity, or predictive value of AI models as outcome. The extracted data were complemented with narrative synthesis. Out of 1497 search results, this umbrella review included five studies. One of the five articles was an SR while the other four were SRMAs. Three studies focused on patients with temporomandibular joint (TMJ) problems as a group, whereas two were specific to temporomandibular joint osteoarthritis (TMJOA). The included studies reported the use of imaging datasets as samples, including cone-beam computed tomography (CBCT), magnetic resonance imaging (MRI), and panoramic radiography. The studies reported an accuracy level ranging from 0.59 to 1. Four studies reported sensitivity levels ranging from 0.76 to 0.80. Four studies reported specificity values ranging from 0.63 to 0.95 for TMJ conditions. However, only one study provided the area under the curve (AUC) in the diagnosis of TMDs. AI has the ability to provide faster, more accurate, sensitive, and objective diagnosis of TMJ condition. However, the performance is determined on the AI models and datasets used. Therefore, before implementing AI models in clinical practice, it is essential for researchers to extensively refine and evaluate the AI application.
- Research Article
8
- 10.1038/s41598-024-69848-9
- Aug 14, 2024
- Scientific Reports
This study investigated the usefulness of deep learning-based automatic detection of temporomandibular joint (TMJ) effusion using magnetic resonance imaging (MRI) in patients with temporomandibular disorder and whether the diagnostic accuracy of the model improved when patients’ clinical information was provided in addition to MRI images. The sagittal MR images of 2948 TMJs were collected from 1017 women and 457 men (mean age 37.19 ± 18.64 years). The TMJ effusion diagnostic performances of three convolutional neural networks (scratch, fine-tuning, and freeze schemes) were compared with those of human experts based on areas under the curve (AUCs) and diagnosis accuracies. The fine-tuning model with proton density (PD) images showed acceptable prediction performance (AUC = 0.7895), and the from-scratch (0.6193) and freeze (0.6149) models showed lower performances (p < 0.05). The fine-tuning model had excellent specificity compared to the human experts (87.25% vs. 58.17%). However, the human experts were superior in sensitivity (80.00% vs. 57.43%) (all p < 0.001). In gradient-weighted class activation mapping (Grad-CAM) visualizations, the fine-tuning scheme focused more on effusion than on other structures of the TMJ, and the sparsity was higher than that of the from-scratch scheme (82.40% vs. 49.83%, p < 0.05). The Grad-CAM visualizations agreed with the model learned through important features in the TMJ area, particularly around the articular disc. Two fine-tuning models on PD and T2-weighted images showed that the diagnostic performance did not improve compared with using PD alone (p < 0.05). Diverse AUCs were observed across each group when the patients were divided according to age (0.7083–0.8375) and sex (male:0.7576, female:0.7083). The prediction accuracy of the ensemble model was higher than that of the human experts when all the data were used (74.21% vs. 67.71%, p < 0.05). A deep neural network (DNN) was developed to process multimodal data, including MRI and patient clinical data. Analysis of four age groups with the DNN model showed that the 41–60 age group had the best performance (AUC = 0.8258). The fine-tuning model and DNN were optimal for judging TMJ effusion and may be used to prevent true negative cases and aid in human diagnostic performance. Assistive automated diagnostic methods have the potential to increase clinicians’ diagnostic accuracy.
- Research Article
4
- 10.1016/j.survophthal.2024.09.003
- Sep 30, 2024
- Survey of Ophthalmology
We focus on the utility of artificial intelligence (AI) in the management of macular hole (MH). We synthesize 25 studies, comprehensively reporting on each AI model’s development strategy, validation, tasks, performance, strengths, and limitations. All models analyzed ophthalmic images, and 5 (20 %) also analyzed clinical features. Study objectives were categorized based on 3 stages of MH care: diagnosis, identification of MH characteristics, and postoperative predictions of hole closure and vision recovery. Twenty-two (88 %) AI models underwent supervised learning, and the models were most often deployed to determine a MH diagnosis. None of the articles applied AI to guiding treatment plans. AI model performance was compared to other algorithms and to human graders. Of the 10 studies comparing AI to human graders (i.e., retinal specialists, general ophthalmologists, and ophthalmology trainees), 5 (50 %) reported equivalent or higher performance. Overall, AI analysis of images and clinical characteristics in MH demonstrated high diagnostic and predictive accuracy. Convolutional neural networks comprised the majority of included AI models, including those which were high performing. Future research may consider validating algorithms to propose personalized treatment plans and explore clinical use of the aforementioned algorithms.
- Research Article
- 10.37934/sijphpc.3.1.110121b
- Mar 15, 2025
- Semarak International Journal of Public Health and Primary Care
Prostate cancer (PCa), the second leading cause of cancer death in men globally, highlights the need for effective early detection methods. While prostate needle biopsy remains the gold standard, it is invasive and relies on the skill of the practitioner. Magnetic resonance imaging (MRI) is currently the primary method for pre-biopsy detection, and artificial intelligence (AI) models are emerging as promising tools to enhance diagnostic accuracy. This systematic review systematically evaluated the diagnostic performance of MRI-based AI models for detecting and classifying prostate cancer, comparing them to histopathological results. Out of 1153 studies, 30 met the criteria for inclusion. Detection models demonstrated high performance with AUC values ranging from 0.78 to 1.00, while classification models had AUC values between 0.64 and 0.93. Sensitivity varied significantly, with detection models showing 69.6% to 100% and classification models showing 46.81% to 100%. Comparisons between AI models and radiologists’ interpretations showed similar performance levels in ten studies. Overall, AI models were more effective in detecting prostate cancer than in classifying it, suggesting their potential to improve diagnostic accuracy. However, the variability in performance highlights the need for careful integration of AI into clinical practice and radiological workflows.
- Research Article
9
- 10.1186/s13054-025-05468-7
- Jun 6, 2025
- Critical Care
BackgroundLarge language models (LLMs) have demonstrated potential in assisting clinical decision-making. However, studies evaluating LLMs’ diagnostic performance on complex critical illness cases are lacking. We aimed to assess the diagnostic accuracy and response quality of an artificial intelligence (AI) model, and evaluate its potential benefits in assisting critical care residents with differential diagnosis of complex cases.MethodsThis prospective comparative study collected challenging critical illness cases from the literature. Critical care residents from tertiary teaching hospitals were recruited and randomly assigned to non-AI-assisted physician and AI-assisted physician groups. We selected a reasoning model, DeepSeek-R1, for our study. We evaluated the model’s response quality using Likert scales, and we compared the diagnostic accuracy and efficiency between groups.ResultsA total of 48 cases were included. Thirty-two critical care residents were recruited, with 16 residents assigned to each group. Each resident handled an average of 3 cases. DeepSeek-R1’s responses received median Likert grades of 4.0 (IQR 4.0–5.0; 95% CI 4.0–4.5) for completeness, 5.0 (IQR 4.0–5.0; 95% CI 4.5–5.0) for clarity, and 5.0 (IQR 4.0–5.0; 95% CI 4.0–5.0) for usefulness. The AI model’s top diagnosis accuracy was 60% (29/48; 95% CI 0.456–0.729), with a median differential diagnosis quality score of 5.0 (IQR 4.0–5.0; 95% CI 4.5–5.0). Top diagnosis accuracy was 27% (13/48; 95% CI 0.146–0.396) in the non-AI-assisted physician group versus 58% (28/48; 95% CI 0.438–0.729) in the AI-assisted physician group. Median differential quality scores were 3.0 (IQR 0–5.0; 95% CI 2.0–4.0) without and 5.0 (IQR 3.0–5.0; 95% CI 3.0–5.0) with AI assistance. The AI model showed higher diagnostic accuracy than residents, and AI assistance significantly improved residents’ accuracy. The residents’ diagnostic time significantly decreased with AI assistance (median, 972 s; IQR 570–1320; 95% CI 675–1200) versus without (median, 1920 s; IQR 1320–2640; 95% CI 1710–2370).ConclusionsFor diagnostically difficult critical illness cases, DeepSeek-R1 generates high-quality information, achieves reasonable diagnostic accuracy, and significantly improves residents’ diagnostic accuracy and efficiency. Reasoning models are suggested to be promising diagnostic adjuncts in intensive care units.
- Supplementary Content
- 10.2196/78310
- Jan 13, 2026
- Journal of Medical Internet Research
BackgroundThe global rise of metabolic associated fatty liver disease reflects the urgent need for accurate, noninvasive diagnostic approaches. The invasive nature of liver biopsy and the limited sensitivity of ultrasound in detecting early steatosis highlight a critical diagnostic gap. Artificial intelligence (AI) has emerged as a transformative tool, enabling the automated detection and grading of hepatic steatosis (HS) from medical imaging data.ObjectiveThis review aims to quantitatively evaluate the diagnostic performance of AI models for HS, explore sources of interstudy heterogeneity, and provide an appraisal of their clinical applicability, translational potential, and the major barriers impeding widespread implementation.MethodsPubMed, Cochrane Library, Embase, Web of Science, and IEEE Xplore databases were searched until September 24, 2025. Studies using AI for HS diagnosis, meeting predefined PIRT (Patient Selection, Index Test, Reference Standard, Flow and Timing) framework and providing extractable data were included. Diagnostic performance indicators, including sensitivity, specificity, and the area under the summary receiver operating characteristic curve (AUC), were extracted and quantitatively synthesized. Meta-analyses were conducted using a bivariate random effects model. The methodological quality and risk of bias were evaluated using the QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies 2) tool. Heterogeneity was assessed through the I² statistic, bivariate box plots, 95% PIs, and threshold effect analysis. Clinical applicability was examined using the Fagan nomogram and likelihood ratio tests.ResultsA total of 36 eligible studies were identified, of which 33 (comprising 36 cohorts) were included in the subgroup analyses. Results demonstrated excellent diagnostic accuracy of AI models, with a summary sensitivity of 0.95 (95% CI 0.93-0.96), specificity of 0.93 (95% CI 0.91-0.94), and an AUC of 0.98 (95% CI 0.96-0.99). Clinical applicability analysis (positive likelihood ratio >10; negative likelihood ratio <0.1) supported AI’s strong potential for both confirming and excluding HS. However, substantial heterogeneity was observed across studies (I² >75%). According to QUADAS-2, a high risk of bias, particularly in the Patient Selection domain (44.4%), may have contributed to the overestimation of real-world performance. Subgroup analyses showed that deep learning models significantly outperformed traditional machine learning approaches (AUC: 0.98 vs 0.94). Models using ultrasound or histopathology references, retrospective designs, transfer learning, and public datasets achieved the highest accuracy (AUC 0.98-0.99) but contributed to interstudy heterogeneity.ConclusionsAI demonstrates remarkable potential for noninvasive screening and assessment of HS, especially in primary care. Nonetheless, clinical translation remains limited by performance variability, retrospective designs, lack of external validation, practical barriers such as data privacy and workflow integration. Future studies should prioritize prospective multicenter trials and standardized external validation to bridge the gap between current evidence and clinical application. The key innovation of this review lies in establishing a unified, modality-agnostic analytical framework that integrates evidence beyond single-modality evaluations.
- Discussion
14
- 10.1016/s2589-7500(19)30124-4
- Sep 24, 2019
- The Lancet Digital Health
Human versus machine in medicine: can scientific literature answer the question?
- Research Article
84
- 10.1002/lt.24867
- Nov 20, 2017
- Liver transplantation : official publication of the American Association for the Study of Liver Diseases and the International Liver Transplantation Society
Early detection of small hepatocellular carcinoma (HCC) lesions can improve longterm patient survival. A systematic review and meta-analysis of the diagnostic performance of gadoxetic acid disodium (Gd-EOB-DTPA)-enhanced magnetic resonance imaging (MRI) and multidetector computed tomography (MDCT) was performed in diagnosing small HCCs measuring up to 2 cm (≤2 cm). Two investigators searched multiple databases for studies in which the performances of either Gd-EOB-DTPA-enhanced MRI or MDCT were reported with sufficient data to construct 2 × 2 contingency tables for diagnosing HCCs up to 2 cm on a per-lesion or per-patient level. Diagnostic performances were quantitatively pooled by a bivariate random-effect model with further meta-regression and subgroup analyses. A total of 27 studies (14 on Gd-EOB-DTPA-enhanced MRI, 9 on MDCT, and 4 on both) were included, enrolling a total of 1735 patients on Gd-EOB-DTPA-enhanced MRI and 1781 patients on MDCT. Gd-EOB-DTPA-enhanced MRI demonstrated significantly higher overall sensitivity than did MDCT (0.96 versus 0.65; P < 0.01), without substantial loss of specificity (0.94 versus 0.98; P > 0.05). Area under the summary receiver operating characteristic curve was 0.97 with Gd-EOB-DTPA-enhanced MRI and 0.85 with MDCT. Regarding Gd-EOB-DTPA-enhanced MRI, sensitivity was significantly higher for studies from non-Asian countries than Asian countries (0.96 versus 0.93; P < 0.01), for retrospective studies than prospective studies (0.95 versus 0.91; P < 0.01), and for those with Gd-EOB-DTPA injection rate ≥ 1.5 mL/s than that of <1.5 mL/s (0.97 versus 0.90; P < 0.01). In conclusion, Gd-EOB-DTPA-enhanced MRI demonstrated higher sensitivity and overall diagnostic accuracy than MDCT, and thus should be the preferred imaging modality for diagnosing small HCCs measuring up to 2 cm. Liver Transplantation 23 1505-1518 2017 AASLD.
- Research Article
17
- 10.3389/fonc.2022.1026216
- Oct 12, 2022
- Frontiers in Oncology
PurposeThe purpose of this study was to evaluate the diagnostic accuracy of artificial intelligence (AI) models with magnetic resonance imaging(MRI) in predicting pathological complete response(pCR) to neoadjuvant chemoradiotherapy (nCRT) in patients with rectal cancer. Furthermore, assessed the methodological quality of the models.MethodsWe searched PubMed, Embase, Cochrane Library, and Web of science for studies published before 21 June 2022, without any language restrictions. The Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) and Radiomics Quality Score (RQS) tools were used to assess the methodological quality of the included studies. We calculated pooled sensitivity and specificity using random-effects models, I2 values were used to measure heterogeneity, and subgroup analyses to explore potential sources of heterogeneity.ResultsWe selected 21 papers for inclusion in the meta-analysis from 1562 retrieved publications, with a total of 1873 people in the validation groups. The meta-analysis showed that AI models based on MRI predicted pCR to nCRT in patients with rectal cancer: a pooled area under the curve (AUC) 0.91 (95% CI, 0.88-0.93), sensitivity of 0.82(95% CI,0.71-0.90), pooled specificity 0.86(95% CI,0.80-0.91). In the subgroup analysis, the pooled AUC of the deep learning(DL) model was 0.97, the pooled AUC of the radiomics model was 0.85; the pooled AUC of the combined model with clinical factors was 0.92, and the pooled AUC of the radiomics model alone was 0.87. The mean RQS score of the included studies was 10.95, accounting for 30.4% of the total score.ConclusionsRadiomics is a promising noninvasive method with high value in predicting pathological response to nCRT in patients with rectal cancer. DL models have higher predictive accuracy than radiomics models, and combined models incorporating clinical factors have higher diagnostic accuracy than radiomics models alone. In the future, prospective, large-scale, multicenter investigations using radiomics approaches will strengthen the diagnostic power of pCR.Systematic Review Registration https://www.crd.york.ac.uk/prospero/, identifier CRD42021285630.
- Research Article
- 10.1007/s00330-025-11492-6
- Jan 1, 2025
- European Radiology
ObjectivesThis meta-research study explored the availability of artificial intelligence (AI) models from development studies published in leading radiology journals in 2022, with availability defined as the transparent reporting of relevant technical details, such as model architecture and weights, necessary for independent replication.Materials and methodsA systematic search of Ovid Medline and Embase was conducted to identify AI model development studies published in five leading radiology journals in 2022. Data were extracted on study characteristics, model details, and code and model-sharing practices. The proportion of AI studies sharing their models was analyzed. Logistic regression analyses were employed to explore associations between study characteristics and model availability.ResultsOf 268 studies reviewed, 39.9% (n = 107) made their models available. Deep learning (DL) models exhibited particularly low availability, with only 11.5% (n = 13) of the 113 studies being fully available. Training codes for DL models were provided in 22.1% (n = 25), suggesting limited ability to train DL models with one’s own data. Multivariable logistic regression analysis showed that the use of traditional regression-based models (odds ratio [OR], 17.11; 95% CI: 5.52, 53.05; p < 0.001) was associated with higher availability, while the radiomics package usage (OR, 0.27; 95% CI: 0.11, 0.65; p = 0.003) was associated with lower availability.ConclusionThe availability of AI models in radiology publications remains suboptimal, especially for DL models. Enforcing model-sharing policies, enhancing external validation platforms, addressing commercial restrictions, and providing demos for commercial models in open repositories are necessary to improve transparency and replicability in radiology AI research.Key PointsQuestionThe study addresses the limited availability of AI models in radiology, especially DL models, which impacts external validation and clinical reliability.FindingsOnly 39.9% of radiology AI studies made their models available, with DL models showing particularly low availability at 11.5%.Clinical relevanceImproving the availability of radiology AI models is essential for enabling external validation, ensuring reliable clinical application, and advancing patient care by fostering robust and transparent AI systems.Graphical
- Research Article
- 10.5167/uzh-108409
- Jan 1, 2014
Background Temporomandibular joint (TMJ) arthritis is common in children with juvenile idiopathic arthritis (JIA), but often clinically asymptomatic. Magnetic resonance imaging (MRI) is the most reliable examination method, but requires sedation in young children. The aim of our study was to evaluate whether early TMJ MRI will change the treatment of patients with newly diagnosed JIA. Methods Single center chart review of all patients with a diagnosis of JIA between January 2007 and December 2010. Results We found 147 patients with newly diagnosed JIA during this period. In 111 (76%) at least 1 MRI of the TMJ was available. Reasons why no TMJ MRI was done were parents’ refusal (10), MRI of other locations (7), fixed dental appliances (16) and unclear cause (3). A diagnosis of TMJ arthritis based on increased joint enhancement on MRI was made in 91/111 (82%) patients. The first MRI was done at a median interval of 5 months from the diagnosis of JIA, and 61/111 patients (55%) required sedation for their first MRI. TMJ arthritis was diagnosed in 53/61 (87%) requiring sedation and in 34/50 (68%) patients without sedation (p = 0.003). Following the first TMJ MRI, intra-articular steroid injections were performed into 107 TMJs of 60 patients. 48/147 (33%) patients received at least one DMARD to control their disease, and in 9/48 (19%) the first DMARD was started following the first TMJ MRI. Factors associated with TMJ involvement as demonstrated by MRI were JIA subtype (p = 0.007) and a younger age at diagnosis of JIA (p = 0.04). Conclusion In our cohort of newly diagnosed JIA patients TMJ arthritis was very common. Early TMJ MRI led to changes in treatment in 62% of patients with additional joint injections in 60 patients and start of systemic medication in 9 patients. We especially recommend performing TMJ MRI in young children even if they require sedation, as they have an increased rate of TMJ involvement.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.