Diagnostic Efficacy of Artificial Intelligence Models for Predicting Endodontic Outcome - A Systematic Review and Meta-Analysis.

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

This systematic review was conducted to evaluate the diagnostic ability of artificial intelligence (AI) models for predicting an endodontic radiographically inferred condition. Review was performed in accordance to PRISMA-DTA checklist and registered under PROSPERO (CRD42025631782). Databases were searched from January 2000 to December 2024 for studies comparing the diagnostic ability of AI models compared to dental specialists. Risk of bias (ROB) assessment was done through QUADAS (Quality assessment of diagnostic accuracy studies)-2 tool and meta-analysis was performed in Meta-Disc 1.4 software and Review Manager 5.3 for pooled sensitivity, specificity, and summary receiver operating characteristics (SROCs). Five studies were included for analysis. Included studies revealed the presence of moderate to low ROB. Various AI models analysed and evaluated as an index test were artificial neural network, convolutional neural network, direct learning, and direct learning network. Meta-analysis revealed a pooled sensitivity of 0.83 (95% confidence interval (CI) 0.31-1.00) and a pooled specificity of 0.33 (95% CI 0.03-0.81); the summary receiver operating characteristics (SROC) through area under curve (AUC) was 0.54. The included AI models were trained and evaluated on radiographic data only; therefore, findings reflect diagnostic accuracy of image-based AI in detecting radiographic signs associated with endodontic disease rather than comprehensive clinical prognoses. While AI demonstrated moderate sensitivity for identifying these endodontic conditions, low specificity indicates a high false-positive rate when used as a standalone radiograph-based tool. These models may serve as adjunctive screening aids but require prospective validation that integrates clinical and treatment variables before they can be used to predict longitudinal treatment outcomes.

Similar Papers
  • PDF Download Icon
  • Research Article
  • Cite Count Icon 2
  • 10.3389/fonc.2024.1363812
Deep learning or radiomics based on CT for predicting the response of gastric cancer to neoadjuvant chemotherapy: a meta-analysis and systematic review.
  • Mar 27, 2024
  • Frontiers in oncology
  • Zhixian Bao + 4 more

Artificial intelligence (AI) models, clinical models (CM), and the integrated model (IM) are utilized to evaluate the response to neoadjuvant chemotherapy (NACT) in patients diagnosed with gastric cancer. The objective is to identify the diagnostic test of the AI model and to compare the accuracy of AI, CM, and IM through a comprehensive summary of head-to-head comparative studies. PubMed, Web of Science, Cochrane Library, and Embase were systematically searched until September 5, 2023, to compile English language studies without regional restrictions. The quality of the included studies was evaluated using the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) criteria. Forest plots were utilized to illustrate the findings of diagnostic accuracy, while Hierarchical Summary Receiver Operating Characteristic curves were generated to estimate sensitivity (SEN) and specificity (SPE). Meta-regression was applied to analyze heterogeneity across the studies. To assess the presence of publication bias, Deeks' funnel plot and an asymmetry test were employed. A total of 9 studies, comprising 3313 patients, were included for the AI model, with 7 head-to-head comparative studies involving 2699 patients. Across the 9 studies, the pooled SEN for the AI model was 0.75 (95% confidence interval (CI): 0.66, 0.82), and SPE was 0.77 (95% CI: 0.69, 0.84). Meta-regression was conducted, revealing that the cut-off value, approach to predicting response, and gold standard might be sources of heterogeneity. In the head-to-head comparative studies, the pooled SEN for AI was 0.77 (95% CI: 0.69, 0.84) with SPE at 0.79 (95% CI: 0.70, 0.85). For CM, the pooled SEN was 0.67 (95% CI: 0.57, 0.77) with SPE at 0.59 (95% CI: 0.54, 0.64), while for IM, the pooled SEN was 0.83 (95% CI: 0.79, 0.86) with SPE at 0.69 (95% CI: 0.56, 0.79). Notably, there was no statistical difference, except that IM exhibited higher SEN than AI, while maintaining a similar level of SPE in pairwise comparisons. In the Receiver Operating Characteristic analysis subgroup, the CT-based Deep Learning (DL) subgroup, and the National Comprehensive Cancer Network (NCCN) guideline subgroup, the AI model exhibited higher SEN but lower SPE compared to the IM. Conversely, in the training cohort subgroup and the internal validation cohort subgroup, the AI model demonstrated lower SEN but higher SPE than the IM. The subgroup analysis underscored that factors such as the number of cohorts, cohort type, cut-off value, approach to predicting response, and choice of gold standard could impact the reliability and robustness of the results. AI has demonstrated its viability as a tool for predicting the response of GC patients to NACT Furthermore, CT-based DL model in AI was sensitive to extract tumor features and predict the response. The results of subgroup analysis also supported the above conclusions. Large-scale rigorously designed diagnostic accuracy studies and head-to-head comparative studies are anticipated. PROSPERO, CRD42022377030.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 14
  • 10.1371/journal.pone.0288631
Artificial intelligence for detecting temporomandibular joint osteoarthritis using radiographic image data: A systematic review and meta-analysis of diagnostic test accuracy.
  • Jul 14, 2023
  • PloS one
  • Liang Xu + 4 more

In this review, we assessed the diagnostic efficiency of artificial intelligence (AI) models in detecting temporomandibular joint osteoarthritis (TMJOA) using radiographic imaging data. Based upon the PRISMA guidelines, a systematic review of studies published between January 2010 and January 2023 was conducted using PubMed, Web of Science, Scopus, and Embase. Articles on the accuracy of AI to detect TMJOA or degenerative changes by radiographic imaging were selected. The characteristics and diagnostic information of each article were extracted. The quality of studies was assessed by the QUADAS-2 tool. Pooled data for sensitivity, specificity, and summary receiver operating characteristic curve (SROC) were calculated. Of 513 records identified through a database search, six met the inclusion criteria and were collected. The pooled sensitivity, specificity, and area under the curve (AUC) were 80%, 90%, and 92%, respectively. Substantial heterogeneity between AI models mainly arose from imaging modality, ethnicity, sex, techniques of AI, and sample size. This article confirmed AI models have enormous potential for diagnosing TMJOA automatically through radiographic imaging. Therefore, AI models appear to have enormous potential to diagnose TMJOA automatically using radiographic images. However, further studies are needed to evaluate AI more thoroughly.

  • Research Article
  • Cite Count Icon 8
  • 10.1016/j.jacr.2021.06.025
Real-World Surveillance of FDA-Cleared Artificial Intelligence Models: Rationale and Logistics.
  • Feb 1, 2022
  • Journal of the American College of Radiology
  • Keith J Dreyer + 2 more

Real-World Surveillance of FDA-Cleared Artificial Intelligence Models: Rationale and Logistics.

  • Peer Review Report
  • 10.7554/elife.83662.sa1
Decision letter: Development and evaluation of a live birth prediction model for evaluating human blastocysts from a retrospective study
  • Dec 12, 2022
  • Larisa V Suturina + 1 more

Decision letter: Development and evaluation of a live birth prediction model for evaluating human blastocysts from a retrospective study

  • Peer Review Report
  • 10.7554/elife.83662.sa0
Editor's evaluation: Development and evaluation of a live birth prediction model for evaluating human blastocysts from a retrospective study
  • Dec 12, 2022
  • Larisa V Suturina

Article Figures and data Abstract Editor's evaluation eLife digest Introduction Methods Results Discussion Data availability References Decision letter Author response Article and author information Metrics Abstract Background: In infertility treatment, blastocyst morphological grading is commonly used in clinical practice for blastocyst evaluation and selection, but has shown limited predictive power on live birth outcomes of blastocysts. To improve live birth prediction, a number of artificial intelligence (AI) models have been established. Most existing AI models for blastocyst evaluation only used images for live birth prediction, and the area under the receiver operating characteristic (ROC) curve (AUC) achieved by these models has plateaued at ~0.65. Methods: This study proposed a multimodal blastocyst evaluation method using both blastocyst images and patient couple’s clinical features (e.g., maternal age, hormone profiles, endometrium thickness, and semen quality) to predict live birth outcomes of human blastocysts. To utilize the multimodal data, we developed a new AI model consisting of a convolutional neural network (CNN) to process blastocyst images and a multilayer perceptron to process patient couple’s clinical features. The data set used in this study consists of 17,580 blastocysts with known live birth outcomes, blastocyst images, and patient couple’s clinical features. Results: This study achieved an AUC of 0.77 for live birth prediction, which significantly outperforms related works in the literature. Sixteen out of 103 clinical features were identified to be predictors of live birth outcomes and helped improve live birth prediction. Among these features, maternal age, the day of blastocyst transfer, antral follicle count, retrieved oocyte number, and endometrium thickness measured before transfer are the top five features contributing to live birth prediction. Heatmaps showed that the CNN in the AI model mainly focuses on image regions of inner cell mass and trophectoderm (TE) for live birth prediction, and the contribution of TE-related features was greater in the CNN trained with the inclusion of patient couple's clinical features compared with the CNN trained with blastocyst images alone. Conclusions: The results suggest that the inclusion of patient couple’s clinical features along with blastocyst images increases live birth prediction accuracy. Funding: Natural Sciences and Engineering Research Council of Canada and the Canada Research Chairs Program. Editor's evaluation This article provides important findings that have practical implications for reproductive medicine and would be of interest to IVF specialists. Based on the compelling strength of evidence, the study demonstrates significant results in improving the predictive value of the live birth model based on blastocyst evaluation and clinical features. https://doi.org/10.7554/eLife.83662.sa0 Decision letter Reviews on Sciety eLife's review process eLife digest More than 50 million couples worldwide experience infertility. The most common treatment is in vitro fertilization (IVF). Fertility specialists collect eggs and sperm from the prospective parents. They combine the egg and sperm in a laboratory and allow the fertilized eggs to develop for five days into a multi-celled blastocyst. Then, the specialists select the healthiest blastocysts and return them to the patient's uterus. Since 1978, more than 8 million children have been conceived through IVF. Yet, only about 30% of IVF attempts result in a successful birth. As a result, fertility patients often undergo multiple rounds of IVF, which can be expensive and emotionally draining. Several factors determine IVF success, one of which is the health of the blastocysts selected for transfer to the uterus. Specialists select the blastocysts using several criteria. But these human assessments are subjective and inconsistent in predicting which ones are most likely to result in a successful birth. Recent studies suggest artificial intelligence technology may help select blastocysts. Liu et al. show that using artificial intelligence to assess blastocysts and fertility patient characteristics leads to more accurate predictions about which blastocysts are likely to result in a successful birth. In the experiments, the researchers trained an artificial intelligence computer program using pictures of 17,580 blastocysts with known birth outcomes and the parents' clinical characteristics. The model identified 16 parental factors associated with birth outcomes. The top 5 most predictive parental factors were maternal age, the day of blastocyst transfer to the uterus, how many eggs were present in the ovaries, the number of eggs retrieved and the thickness of the uterus lining. The program achieved the highest prediction of healthy births so far, compared to success rates listed in other studies. Artificial intelligence-aided blastocyte selection using patient and blastocyst characteristics may improve IVF success rates and reduce the number of treatment cycles patient couples undergo. Before specialists can use artificial intelligence in their clinics, they must conduct confirmatory clinical studies that enroll patient couples to compare conventional methods and artificial intelligence. Introduction Infertility is a global health issue, affecting more than 50 million couples worldwide (Mascarenhas et al., 2012). Since the birth of the first in vitro fertilization (IVF) child in 1978, over 8 million children were born with IVF treatment (Adamson et al., 2018). Among the various factors contributing to IVF outcomes, the quality of the blastocyst (day 5 embryo) selected for transfer is critical for the success of IVF treatment. Manual grading of blastocyst development stage, inner cell mass (ICM), and trophectoderm (TE) remains the most common method for blastocyst evaluation. While the blastocyst morphological grading is widely used in clinical practice, morphological grades of the development stage, ICM, and TE have shown limited predictive power on clinical outcomes (Seli et al., 2011; Reignier et al., 2019; Bartolacci et al., 2021; Ueno et al., 2021; Xiong et al., 2022). It is desired to identify features for accurate prediction of clinical outcomes of blastocysts. To achieve this goal, the convolutional neural network (CNN) is expected to play a critical role. CNN is able to automatically detect discriminative features from images and has been the state-of-the-art method in various fields in medical imaging, such as lung cancer prediction (Ardila et al., 2019), breast cancer prediction (McKinney et al., 2020), and diabetic retinopathy screening (Bora et al., 2021). To apply CNN to predict the clinical outcome of a blastocyst, images of blastocysts with a known clinical outcome (e.g., pregnancy and live birth) are collected for the CNN model development. The area under the receiver operating characteristic (ROC) curve (AUC) is the most commonly used metric to evaluate and compare machine learning models on predicting clinical outcomes of blastocysts (Kragh and Karstoft, 2021a). The AUCs reported in the literature using CNN to predict clinical outcomes from blastocyst images range from 0.64 to 0.71 for pregnancy prediction (VerMilyea et al., 2020; Kragh et al., 2021b; Berntsen et al., 2022; Enatsu et al., 2022; Loewke et al., 2022), and are around 0.65 for live birth prediction (Miyagi et al., 2019; Nagaya and Ukita, 2021). Besides using blastocyst images, attempts have also been made to use time-lapse videos for live birth prediction. These videos contain the entire development process from days 0 to 5–7. However, results in the literature show that CNN using static images achieved similar or slightly better accuracies in predicting clinical outcomes than using time-lapse videos, for instance, AUC=0.68–0.71 (VerMilyea et al., 2020; Enatsu et al., 2022; Loewke et al., 2022) versus 0.64–0.67 (Kragh et al., 2021b; Berntsen et al., 2022) for pregnancy prediction, and 0.66 (Miyagi et al., 2019) versus 0.65 (Nagaya and Ukita, 2021) for live birth prediction. A potential reason is that the redundant frames of images in time-lapse videos may work as noise causing the model to overfit and thus leading to a lower prediction accuracy (Zhu et al., 2018; Wu et al., 2021; Tao et al., 2022). Therefore, we opted to use static blastocyst images in this study. Different from using blastocyst images alone to predict clinical outcomes, Miyagi et al., 2020 proposed to use blastocyst images together with maternal clinical features including maternal age, AMH, and BMI and reported an AUC of 0.74, the highest accuracy in literature (Miyagi et al., 2020). However, two questions remain elusive. First, the contribution of blastocyst images and the additional contribution of maternal clinical features to live birth prediction are unknown. Second, endometrium status-related features, such as endometrium thickness and pattern, are also critical factors impacting live birth outcomes (Ng et al., 2007; Bu et al., 2016; Mahutte et al., 2022), but were not considered. In this study, we quantified the effect of blastocyst images and the combined effect of both blastocyst images and patient couple’s clinical features on live birth prediction. The live birth prediction model using only blastocyst images achieved an AUC of 0.67, which was significantly outperformed by the AUC of 0.77 achieved by the model using both blastocyst images and patient couple’s clinical features (p value<0.0001). Additionally, when endometrium status-related features (e.g., endometrium thickness and pattern) were excluded, the AUC of the model using both blastocyst images and patient couple’s clinical features significantly decreased to 0.74 (p value<0.0001), indicating that the inclusion of endometrium status-related clinical features helps improve live birth prediction accuracy. Sixteen patient couple’s clinical features were identified to be most related to live birth outcomes of blastocysts, among which maternal age, the day of blastocyst transfer, antral follicle count (AFC), retrieved oocyte number, and endometrium thickness measured before transfer are the top five features contributing to live birth prediction. Additionally, the CNN heatmaps showed that the CNN mainly focused on ICM and TE for live birth prediction, and the contribution of TE-related features was greater in the CNN trained with the inclusion of patient couple’s clinical features compared with the CNN trained with blastocyst images alone. Methods Data set collection We used retrospectively collected data to develop the live birth prediction model. Transferred blastocysts with known live birth outcomes for patients who underwent frozen embryo transfer cycles from 2016 to 2020, at the Reproductive and Genetic Hospital of CITIC-Xiangya, were reviewed for inclusion in the data set. Informed consent was not necessary because this study used retrospective and fully de-identified data, no medical intervention was performed on the subject, and no biological samples from the patient were collected. This study was approved by the Ethics Committee of the Reproductive and Genetic Hospital of CITIC-Xiangya (approval number: LL-SC-2021-008). Blastocyst images were captured before transfer using a standard optical light microscope mounted with a camera. Two grayscale images were captured for each blastocyst, one focusing on ICM and the other focusing on TE. Blastocysts were cropped from the original images which have a resolution of 1024×768 and were consistently padded to 500×500 to facilitate model training. Patient couple’s clinical features consist of 103 features including maternal age and BMI, the day of blastocyst transfer, infertility diagnosis and treatment history of patient couples, ovarian stimulation protocols, maternal hormone profiles, and ultrasound results measured during the ovarian stimulation process and before transfer, and paternal semen diagnosis results (see Supplementary file 1 for a complete list). Based on p value analysis and logistic regression (LR)-based sequential forward feature selection (Solorio-Fernández et al., 2020; Raschka, 2018), 16 clinical features that are most relevant to live birth prediction were identified and used for training the machine learning model (see Figure 3). Feature selection reduces the input feature dimensions by removing redundant features and features with limited predictive power, thus improving the model generalization capability (see Figure 3—figure supplement 1). The LR-based feature selection was used due to its computing efficiency, we also presented the result of multilayer perceptron (MLP)-based feature selection in Figure 3—figure supplement 1 and Figure 3—figure supplement 2. A total of 28,118 blastocysts with known live birth outcomes were reviewed, among which 17,580 blastocysts with two blastocyst images and all the 16 clinical features available were included in the data set. Model architecture Figure 1 shows the architecture of the live birth prediction model based on multimodal blastocyst evaluation. It consists of a CNN to process blastocyst images and an MLP to process patient couple’s clinical features. Features from the CNN and the MLP are fused; thus, the model can be trained to simultaneously take into account both blastocyst images and patient couple’s clinical features for live birth prediction. The last fully connected layer in the CNN and the last fully connected layer in the MLP each output a decision-level feature, which has two variables used to classify the blastocyst into the positive or negative live birth outcome category. The adding operation fuses decision-level features from the CNN and the MLP, and the result of addition is taken as the final output of the overall live birth prediction model. Figure 1 Download asset Open asset Architecture of the live birth prediction model based on multimodal blastocyst evaluation. CNN, convolutional neural network; FC, fully connected layer; MLP, multilayer perceptron. Figure 1—source data 1 Source code to reproduce the model. https://cdn.elifesciences.org/articles/83662/elife-83662-fig1-data1-v2.zip Download elife-83662-fig1-data1-v2.zip Model implementation and training The proposed live birth prediction model used EfficientNetV2-S as the backbone CNN. EfficientNetV2-S is the baseline model in the EfficientNetV2 family, which is a new family of CNN models that provide higher accuracy and training speed than conventional models (Tan and Le, 2021). In our work, the output dimension of the final fully connected layer in EfficientNetV2-S was set to be two, representing the positive and negative live birth outcome of a blastocyst, respectively. The model was implemented using PyTorch 1.10.1 (Paszke et al., 2019). Each of the 17,580 blastocysts had two images taken at different focal planes, one focused more on TE cells and the other on ICM. Furthermore, for each blastocyst, live birth outcomes and all the 16 patient couple’s clinical features were available. The blastocysts were randomly split into 80%:10%:10% to construct the training, validation, and testing data sets. The stratified random sampling approach was used to ensure that all split data sets have the same distribution of minority and majority classes. Since the ratio of blastocysts with a positive live birth outcome in the data set is 0.368, to mitigate the model’s prediction bias toward the majority category (i.e., the negative live birth outcome), the weighted sampling approach, which can help rebalance the class distributions when sampling from an imbalanced data set (Feng et al., 2021), was employed for training the model. In the weighted sampling approach, the probability of each item to be selected is determined by its weight, and the weight of each item is assigned by inverse class frequencies. In this way, the weighted sampling approach rebalances the class distributions by oversampling the minority class and under-sampling the majority class. We also verified the approach of using weighted cross-entropy loss, which assigns greater weights to the loss caused by the prediction error of minority classes. Both approaches helped mitigate the prediction bias toward the majority class, and the results showed that the weighted sampling approach outperformed the weighted cross-entropy loss method. Model performance is subject to training hyperparameters (e.g., optimizer, learning rate, batch size, and number of layers). Hence, an automatic hyperparameter-tuning tool is used, Facebook Ax (version 0.2.2, https://github.com/facebook/Ax), to search for the optimal hyperparameters for model training. The selected hyperparameters for training the model include a batch size of 16, an SGD optimizer with a learning rate of 0.008, and a momentum of 0.39, and three hidden layers in the MLP. A dropout layer follows each hidden layer in the MLP to prevent overfitting. The number of nodes in each hidden layer is 6836, 5657, and 468, respectively. The dropout rate in each dropout layer is 0.01, 0.07, and 0.67, respectively. The model was trained with four RTX A6000 GPUs. It took about 30 hr to search for the optimal hyperparameters and about an hour to train the model using the optimal hyperparameters. Statistical analysis Statistical tests were calculated to compare clinical features between blastocysts with the positive live birth outcome and blastocysts with the negative live birth outcome. Chi-squared test was used for categorical features, t test was used for numerical features. Chi-squared test and t test were performed using Python (version 3.6). ROC curves were compared by the DeLong test implemented in MedCalc software (version 20). All statistical tests were two-tailed and considered significant if p value≤0.05. Results The inclusion of patient couple’s clinical features increased AUC for live birth prediction To quantify the individual effect of blastocyst images and the combined effect of patient couple’s clinical features, we built and compared models that (1) used only blastocyst images for live birth prediction, and (2) used both blastocyst images and patient couple’s clinical features for prediction. In addition, to quantify the specific effect of endometrium status-related features (i.e., endometrium thickness before transfer, endometrium thickness on HCG day, and endometrium pattern B (yes/no) on HCG day) on live birth prediction, a third model trained using blastocyst images and patient couple’s clinical features where endometrium status-related features were excluded, was also built and compared. Figure 2 shows the ROC curves and AUCs of the three models for predicting live birth outcomes of 1758 blastocysts (i.e., 10% of 17,580) in the test data set. Using only blastocyst images for live birth prediction gave an AUC of 0.67, with a 95% confidence interval (CI) of 0.65–0.70. Using blastocyst images and patient couple’s clinical features (endometrium status-related features excluded) significantly increased the AUC to 0.74 (95% CI: 0.72–0.76, p value<0.0001). Using both blastocyst images and patient couple’s clinical features (endometrium status-related features included) achieved a prediction AUC of 0.77 (95% CI: 0.75–0.79), which is significantly higher than using only blastocyst images for prediction (p value<0.0001) and than using blastocyst images and patient couple’s clinical features where endometrium status-related features were excluded (p value=0.007). Figure 2 Download asset Open asset Receiver operating characteristic (ROC) analysis. ROC curves of the model using only blastocyst images, the model using blastocyst images and patient couple’s clinical features where EM-status related features were excluded, and the model using blastocyst images and patient couple’s clinical features where EM-status related features were included to predict live birth outcomes of 1758 blastocysts in the test data set. AUC, area under the ROC curve; EM, endometrium; EM status-related features, endometrium thickness before transfer, endometrium thickness on HCG day, endometrium pattern B (yes/no) on HCG day. Figure 2—source data 1 Code and data used to generate the ROC curves. https://cdn.elifesciences.org/articles/83662/elife-83662-fig2-data1-v2.zip Download elife-83662-fig2-data1-v2.zip Ranking the predictive power of patient couple’s clinical features We then investigated the predictive power of each patient couple’s clinical feature in predicting live birth outcome. Figure 3 shows the 16 features that were identified to be most related to the live birth outcomes of the blastocysts. These features were ranked according to their AUCs for individually predicting the live birth outcomes of blastocysts using univariable LR. The AUC for each feature was reported as the mean AUC over a tenfold cross-validation process. Figure 3 with 2 supplements see all Download asset Open asset Ranking the predictive power of patient couple’s clinical features. The 16 patient couple’s clinical features that were identified to be most related to the live birth outcomes of the blastocysts ranked by the AUC for individually predicting live birth outcome. AUC, area under the curve. Figure 3—source data 1 Code and data used to generate the AUC ranking chart. https://cdn.elifesciences.org/articles/83662/elife-83662-fig3-data1-v2.zip Download elife-83662-fig3-data1-v2.zip CNN heatmaps In blastocyst images, what does CNN focus on to predict the live birth outcome of a blastocyst? Is there a difference in what the CNN focuses on between the model trained without and with the inclusion of patient couple’s clinical features? To answer these questions, we used the class activation mapping method to generate heatmaps. Figure 4 shows blastocyst images, corresponding heatmaps of the CNN trained without including patient couple’s clinical features, and corresponding heatmaps of the CNN trained with including patient couple’s clinical features. Figure 4 with 1 supplement see all Download asset Open asset CNN heatmaps analysis. Heatmaps of the CNN trained without and with patient couple’s clinical features. Column (A): original blastocyst images. Column (B): corresponding heatmaps of the CNN trained without including patient couple’s clinical features. Column (C): corresponding heatmaps of the CNN trained with the inclusion of patient couple’s clinical features. CNN, convolutional neural network. Figure 4—source data 1 Code and data used to generate the heatmaps shown in Figure 4. https://cdn.elifesciences.org/articles/83662/elife-83662-fig4-data1-v2.zip Download elife-83662-fig4-data1-v2.zip The blastocyst images were cropped and padded to a consistent size to facilitate model training. The padding value was calculated as the mean pixel value of blastocyst images in the data set. Heatmaps were by et al., 2020). that the CNN blastocyst images as the one focusing on ICM and the other focusing on TE. The blastocyst images shown in Figure 4 are focused on ICM, and Figure supplement 1 shows the blastocyst images. As can be in Figure when trained using only blastocyst images, the CNN mainly focuses on ICM and TE for predicting live birth outcomes. training with both blastocyst images and patient couple’s clinical features, TE-related features more to live birth prediction compared with training with blastocyst images Discussion In this study, the individual effect of blastocyst images and the combined effect of patient couple’s clinical features for live birth prediction were quantified by the AUC of the model using only blastocyst images and the model using both blastocyst images and patient couple’s clinical features. AUC of was achieved with blastocyst images only using both blastocyst images and patient couple’s clinical features to a significantly higher AUC of 0.77 in live birth prediction. endometrium status-related features were excluded from patient couple’s clinical features, the AUC of the live birth prediction model significantly decreased (p from 0.77 to 0.74, indicating the of endometrium status-related features in live birth prediction. Sixteen patient couple’s clinical features were identified to be most related to live birth outcomes of blastocysts, among which maternal age, the day of blastocyst transfer, retrieved oocyte number, and endometrium thickness before transfer are the top five features contributing to live birth prediction. This study was based on a multimodal data set collected for blastocyst evaluation. The data set 17,580 blastocysts with known live birth outcomes, blastocyst images, and 16 patient couple’s clinical features. As shown in Figure 16 patient couple’s clinical features include maternal characteristics and hormone measured and on HCG day and and before transfer endometrium status-related features (endometrium thickness on HCG day and before transfer, endometrium pattern on HCG features related to retrieved oocyte the day of blastocyst number of ovarian stimulation and paternal features ratio of A sperm semen the data set by Miyagi et al., 2020 not contain endometrium status-related features and hormone (e.g., and are IVF data sets over of clinical features and live birth outcomes and 2011; et al., 2016; et al., there are no blastocyst images in these data and thus, these data sets be used for models to evaluate blastocysts from their images. To the multimodal data our proposed model was to two including a CNN and an MLP to the model to simultaneously images and numerical clinical features for blastocyst evaluation. The and multimodal data set and the proposed model in the highest AUC value of 0.77 reported thus for predicting live birth outcomes of blastocysts. They also to quantify the predictive power of each feature in predicting the live birth outcomes of blastocysts. The blastocyst grading in and remains the most common method used by to evaluate blastocyst quality the morphological grades of blastocyst development stage, ICM and TE have limited predictive power on live birth outcomes (e.g., for live birth prediction reported by Reignier et al., 2019; Bartolacci et al., 2021; Xiong et al., 2022). Since CNN a state-of-the-art method for many attempts have been made to apply the CNN to blastocyst evaluation for predicting clinical outcomes (e.g., et al., 2020; Kragh et al., 2021b; Berntsen et al., 2022; Enatsu et al., 2022; Loewke et al., 2022; Miyagi et al., 2019; Nagaya and Ukita, 2021). Among the AUC reported in the literature using blastocyst images only were around 0.65 for live birth prediction (Miyagi et al., 2019; Nagaya and Ukita, 2021). we achieved an AUC of (see Figure with the AUC of reported in the literature using grades for live birth prediction, these results that CNN can achieve a higher prediction accuracy. As shown in Figure the CNN mainly focuses on ICM and TE. Different from the TE on the number of TE cells and the of TE cells as a the CNN to focus on specific TE the heatmaps more Miyagi et al., 2020 used both blastocyst images and maternal clinical features AMH, and to predict live birth outcomes and achieved an AUC of 0.74 (Miyagi et al., 2020). The additional contribution from the three maternal clinical features was not no AUC was reported by using blastocyst images alone. Furthermore, their in pregnancy and live endometrium status-related features were not considered in their Therefore, our study used a data set and compared the AUC of live birth prediction using only blastocyst images versus using both blastocyst images and patient couple’s clinical features. We also quantified the of endometrium status-related features in with blastocyst images to improve live birth prediction. Furthermore, we that hormone such as and features related to oocyte such as and number of and the ratio of A sperm representing semen quality are able to work with blastocyst images to improve the live birth prediction accuracy. that in this study, only the total was and or was not available for clinical feature analysis (see Supplementary file 1). This may potential bias in the of as a of live birth. of this study, by the heatmaps of the CNN trained without and with the inclusion of patient couple’s clinical features, is that the weights of TE-related features increased (see Figure A potential reason may be that TE a

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 5
  • 10.1136/bmjopen-2022-064739
Artificial intelligence as a diagnostic aid in cross-sectional radiological imaging of surgical pathology in the abdominopelvic cavity: a systematic review
  • Mar 1, 2023
  • BMJ Open
  • George E Fowler + 5 more

ObjectivesThere is emerging use of artificial intelligence (AI) models to aid diagnostic imaging. This review examined and critically appraised the application of AI models to identify surgical pathology from radiological...

  • Research Article
  • Cite Count Icon 1
  • 10.1016/j.jacr.2025.02.042
Comparing Artificial Intelligence and Traditional Regression Models in Lung Cancer Risk Prediction Using A Systematic Review and Meta-Analysis.
  • Jun 1, 2025
  • Journal of the American College of Radiology : JACR
  • Sierra Leonard + 5 more

Accurately identifying individuals who are at high risk of lung cancer is critical to optimize lung cancer screening with low-dose CT (LDCT). We sought to compare the performance of traditional regression models and artificial intelligence (AI)-based models in predicting future lung cancer risk. A systematic review and meta-analysis were conducted with reporting according to Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. We searched MEDLINE, Embase, Scopus, and the Cumulative Index to Nursing and Allied Health Literature databases for studies reporting the performance of AI or traditional regression models for predicting lung cancer risk. Two researchers screened articles, and a third researcher resolved conflicts. Model characteristics and predictive performance metrics were extracted. The quality of studies was assessed using the Prediction model Risk of Bias Assessment Tool. A meta-analysis assessed the discrimination performance of models, based on area under the receiver operating characteristic curve (AUC). One hundred forty studies met inclusion criteria and included 185 traditional and 64 AI-based models. Of these, 16 AI models and 65 traditional models have been externally validated. The pooled AUC of external validations of AI models was 0.82 (95% confidence interval [CI], 0.80-0.85), and the pooled AUC for traditional regression models was 0.73 (95% CI, 0.72-0.74). In a subgroup analysis, AI models that included LDCT had a pooled AUC of 0.85 (95% CI, 0.82-0.88). Overall risk of bias was high for both AI and traditional models. AI-based models, particularly those using imaging data, show promise for improving lung cancer risk prediction over traditional regression models. Future research should focus on prospective validation of AI models and direct comparisons with traditional methods in diverse populations.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 35
  • 10.1038/s41598-022-13652-w
Comprehensive assessment, review, and comparison of AI models for solar irradiance prediction based on different time/estimation intervals
  • Jun 10, 2022
  • Scientific Reports
  • Olusola Bamisile + 7 more

Solar energy-based technologies have developed rapidly in recent years, however, the inability to appropriately estimate solar energy resources is still a major drawback for these technologies. In this study, eight different artificial intelligence (AI) models namely; convolutional neural network (CNN), artificial neural network (ANN), long short-term memory recurrent model (LSTM), eXtreme gradient boost algorithm (XG Boost), multiple linear regression (MLR), polynomial regression (PLR), decision tree regression (DTR), and random forest regression (RFR) are designed and compared for solar irradiance prediction. Additionally, two hybrid deep neural network models (ANN-CNN and CNN-LSTM-ANN) are developed in this study for the same task. This study is novel as each of the AI models developed was used to estimate solar irradiance considering different timesteps (hourly, every minute, and daily average). Also, different solar irradiance datasets (from six countries in Africa) measured with various instruments were used to train/test the AI models. With the aim to check if there is a universal AI model for solar irradiance estimation in developing countries, the results of this study show that various AI models are suitable for different solar irradiance estimation tasks. However, XG boost has a consistently high performance for all the case studies and is the best model for 10 of the 13 case studies considered in this paper. The result of this study also shows that the prediction of hourly solar irradiance is more accurate for the models when compared to daily average and minutes timestep. The specific performance of each model for all the case studies is explicated in the paper.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 135
  • 10.1002/ctm2.1216
The potential impact of ChatGPT in clinical and translational medicine.
  • Mar 1, 2023
  • Clinical and Translational Medicine
  • Vivian Weiwen Xue + 2 more

The potential impact of ChatGPT in clinical and translational medicine.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 50
  • 10.1007/s00535-022-01849-9
Artificial intelligence (AI) models for the ultrasonographic\xa0diagnosis of liver tumors and comparison of diagnostic accuracies between AI and human experts
  • Jan 1, 2022
  • Journal of Gastroenterology
  • Naoshi Nishida + 17 more

BackgroundUltrasonography (US) is widely used for the diagnosis of liver tumors. However, the accuracy of the diagnosis largely depends on the visual perception of humans. Hence, we aimed to construct artificial intelligence (AI) models for the diagnosis of liver tumors in US.MethodsWe constructed three AI models based on still B-mode images: model-1 using 24,675 images, model-2 using 57,145 images, and model-3 using 70,950 images. A convolutional neural network was used to train the US images. The four-class liver tumor discrimination by AI, namely, cysts, hemangiomas, hepatocellular carcinoma, and metastatic tumors, was examined. The accuracy of the AI diagnosis was evaluated using tenfold cross-validation. The diagnostic performances of the AI models and human experts were also compared using an independent test cohort of video images.ResultsThe diagnostic accuracies of model-1, model-2, and model-3 in the four tumor types are 86.8%, 91.0%, and 91.1%, whereas those for malignant tumor are 91.3%, 94.3%, and 94.3%, respectively. In the independent comparison of the AIs and physicians, the percentages of correct diagnoses (accuracies) by the AIs are 80.0%, 81.8%, and 89.1% in model-1, model-2, and model-3, respectively. Meanwhile, the median percentages of correct diagnoses are 67.3% (range 63.6%–69.1%) and 47.3% (45.5%–47.3%) by human experts and non-experts, respectively.Conclusion The performance of the AI models surpassed that of human experts in the four-class discrimination and benign and malignant discrimination of liver tumors. Thus, the AI models can help prevent human errors in US diagnosis.

  • Research Article
  • 10.54364/aaiml.2024.43159
Predicting Mandibular Bone Growth Using Artificial Intelligence and Machine Learning: A Systematic Review
  • Jan 1, 2024
  • Advances in Artificial Intelligence and Machine Learning
  • Mahmood Dashti + 6 more

Introduction The accurate prediction of mandibular bone growth is crucial in orthodontics and maxillofacial surgery, impacting treatment planning and patient outcomes. Traditional methods often fall short due to their reliance on linear models and clinician expertise, which are prone to human error and variability. Artificial intelligence (AI) and machine learning (ML) offer advanced alternatives, capable of processing complex datasets to provide more accurate predictions. This systematic review examines the efficacy of AI and ML models in predicting mandibular growth compared to traditional methods. Method. A systematic review was conducted following the PRISMA guidelines, focusing on studies published up to July 2024. Databases searched included PubMed, Embase, Scopus, and Web of Science. Studies were selected based on their use of AI and ML algorithms for predicting mandibular growth. A total of 31 studies were identified, with 6 meeting the inclusion criteria. Data were extracted on study characteristics, AI models used, and prediction accuracy. The risk of bias was assessed using the QUADAS-2 tool. Results. The review found that AI and ML models generally provided high accuracy in predicting mandibular growth. For instance, the LASSO model achieved an average error of 1.41 mm for predicting skeletal landmarks. However, not all AI models outperformed traditional methods; in some cases, deep learning models were less accurate than conventional growth prediction models. Discussion. The variability in datasets and study designs across the included studies posed challenges for comparing AI models’ effectiveness. Additionally, the complexity of AI models may limit their clinical applicability. Despite these challenges, AI and ML show significant promise in enhancing predictive accuracy for mandibular growth. Conclusion. AI and ML models have the potential to revolutionize mandibular growth prediction, offering greater accuracy and reliability than traditional methods. However, further research is needed to standardize methodologies, expand datasets, and improve model interpretability for clinical integration.

  • Preprint Article
  • 10.5194/egusphere-egu25-166
Impact of using additional precipitation data from the uppermost region on improving the performance of AI models in predicting groundwater levels
  • Mar 18, 2025
  • Mun-Ju Shin + 6 more

Groundwater is an important water resource that is widely used worldwide for agricultural, industrial, and domestic purposes. In the case of Jeju Island, located in southern South Korea, groundwater is an indispensable water resource that accounts for 82% of the total water supply. Therefore, scientific prediction and management of groundwater levels are very important for the sustainable use of groundwater by citizens. This study additionally used precipitation data from the Baekrokdam Climate Change Observatory located on the summit of Jeju Island in artificial intelligence (AI) models to accurately predict one-month-ahead future groundwater levels for the mid-mountainous areas of Jeju Island, where groundwater levels are highly variable. In other words, the AI models compared and analyzed the improvement effect of the monthly groundwater level prediction performance for 1) using precipitation data from two rainfall stations, groundwater withdrawal data from two groundwater sources, and groundwater level data from two monitoring wells in the study area, and 2) adding precipitation data from Baekrokdam Climate Change Observatory. The study subjects are two groundwater level monitoring wells located at 435-471m above mean sea level in the southeast of Jeju Island. The AI models used to predict groundwater levels are Artificial Neural Network (ANN) and Long Short-Term Memory (LSTM), a deep learning AI model.As a result, when the Baekrokdam precipitation data were not used, the two AI models showed excellent groundwater level prediction performance with Nash-Sutcliffe efficiency (NSE) values of 0.871 or higher. The LSTM model showed relatively higher prediction performance for high and low groundwater levels than the ANN model. This means that the LSTM model adequately incorporates the seasonal effects of wet and dry periods into groundwater level simulations. The more volatile the observed groundwater level, the more difficult it is for the AI models to interpret the characteristics of groundwater level fluctuations, and the lower the performance of predicting future groundwater levels. When additional Baekrokdam precipitation data were used, the two AI models showed improved groundwater level prediction performance by having NSE values of 0.907 or higher. This means that the additional use of precipitation data located in the uppermost region provides more information to help interpret groundwater levels, allowing AI models to better interpret the characteristics of groundwater level fluctuations. In addition, the use of Baekrokdam precipitation data was more helpful in improving groundwater level prediction for the monitoring well, which has highly variable groundwater levels that are difficult to predict, and the ANN model with relatively low groundwater level prediction performance. When additional Baekrokdam precipitation data was used for a specific monitoring well, the groundwater level prediction performance of the ANN model was improved to a level comparable to that of the LSTM model, which is a deep learning AI, even with a relatively simple ANN model structure. This is an example of how important it is to use additional useful data in research using AI models.

  • Research Article
  • 10.1002/cre2.70150
Artificial Intelligence and Hand Hygiene Accuracy: A New Era in Infection Control for Dental Practices
  • May 26, 2025
  • Clinical and Experimental Dental Research
  • Salwa A Aldahlawi + 2 more

ABSTRACTObjectiveThe study aimed to assess the efficacy of an artificial intelligence (AI) model in evaluating hand hygiene (HH) performance compared to infection control auditors in dental clinics.Material and MethodThe AI model utilized a pretrained convolutional neural network (CNN) and was fine‐tuned on a custom data set of videos showing dental students performing alcohol‐based hand rub (ABHR) procedures. A total of 66 videos were recorded, with 33 used for training and 11 for validating the model. The remaining 22 videos were designated for testing and the AI‐ infection control auditors comparison experiment. Two infection control auditors assessed the HH performance videos using a standardized checklist. The model's performance was evaluated through precision, recall, and F1 score across various classes. The level of agreement between the auditors and the AI assessments was measured using Cohen's kappa, and the sensitivity and specificity of the AI were compared to those of the infection control auditors.ResultsThe AI model has learned to differentiate between classes of hand movement, with an overall F1 score of 0.85. Results showed a 90.91% agreement rate between the AI model and infection control auditors in evaluating HH steps, with a sensitivity of 85.7% and specificity of 100% in identifying acceptable HH practices. Step 3 (back of fingers to opposing palm with fingers interlocked) was consistently identified as the most frequently missed step by both the AI model and the infection control auditors.ConclusionThe AI model assessment of HH performance closely matched auditors' evaluations, suggesting its reliability as a tool for evaluating and mentoring HH in dental clinics. Future research should explore the application of AI technology in different dental settings to further validate its feasibility and adaptability.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 9
  • 10.1016/j.humpath.2022.11.004
Deep learning-based predictions of clear and eosinophilic phenotypes in clear cell renal cell carcinoma
  • Nov 11, 2022
  • Human Pathology
  • Chisato Ohe + 11 more

We have recently shown that histological phenotypes focusing on clear and eosinophilic cytoplasm in clear cell renal cell carcinoma (ccRCC) correlated with prognosis and the response to angiogenesis inhibition and checkpoint blockade. This study aims to objectively show the diagnostic utility of clear or eosinophilic phenotypes of ccRCC by developing an artificial intelligence (AI) model using the TCGA-ccRCC dataset and to demonstrate if the clear or eosinophilic predicted phenotypes correlate with pathological factors and gene signatures associated with angiogenesis and cancer immunity. Before the development of the AI model, histological evaluation using hematoxylin and eosin whole-slide images of the TCGA-ccRCC cohort (n=435) was performed by a urologic pathologist. The AI model was developed as follows. First, the highest-grade area on each whole slide image was captured for image processing. Second, the selected regions were cropped into tiles. Third, the AI model was trained using transfer learning on a deep convolutional neural network, and clear or eosinophilic predictions were scaled as AI scores. Next, we verified the AI model using a validation cohort (n=95). Finally, we evaluated the accuracy of the prognostic predictions of the AI model and revealed that the AI model detected clear and eosinophilic phenotypes with high accuracy. The AI model stratified the patients' outcomes, and the predicted eosinophilic phenotypes correlated with adverse clinicopathological characteristics and high immune-related gene signatures. In conclusion, the AI-based histologic subclassification accurately predicted clear or eosinophilic phenotypes of ccRCC, allowing for consistently reproducible stratification for prognostic and therapeutic stratification.

  • Research Article
  • 10.1097/js9.0000000000004443
Artificial Intelligence (AI) in the diagnosis and prediction of adverse pregnancy outcomes for Placenta Accreta Spectrum Disorders (PAS): a systematic review and meta-analysis of diagnostic accuracy.
  • Dec 16, 2025
  • International journal of surgery (London, England)
  • Kai Chen + 6 more

Precise prenatal diagnosis of Placenta accreta spectrum disorders (PAS) is challenging, and the diagnostic performance of conventional imaging modalities remains suboptimal. Artificial intelligence (AI) technologies have emerged as promising tools in assisting image analysis and improving diagnostic accuracy of PAS. Therefore, this study aims to systematically evaluate the diagnostic performance of AI models in diagnosing PAS and predicting adverse pregnancy outcomes (APO) associated with PAS. A systematic search was conducted across multiple databases, including PubMed, Embase, and Cochrane Library, to identify studies assessing the diagnostic performance of AI-based models in PAS or their ability to predict APO. Diagnostic metrics such as sensitivity, specificity, area under the curve (AUC), positive likelihood ratio, negative likelihood ratio, and summary receiver operating characteristic (SROC) curves were pooled to evaluate diagnostic accuracy. Heterogeneity was assessed using Cochran Q and I2 statistics, and meta-regression and subgroup analysis were conducted to examine potential sources of heterogeneity. The Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool was utilized to assess the study quality. A total of 16 studies involving 4,457 participants were included. The pooled results showed that AI models exhibit high sensitivity (88%, 95% CI: 81%-93%) and specificity (88%, 95% CI: 76%-94%) for diagnosing PAS, with an excellent AUC of 0.94 (95% CI: 0.91-0.96). Moreover, AI models also indicated promising performance in predicting clinically significant APO such as massive hemorrhage and hysterectomy, yielding a pooled sensitivity of 80% (95% CI: 73%-85%), specificity of 86% (95% CI: 78%-92%), and AUC of 0.87 (95% CI: 0.84-0.90). Meta-regression and subgroup analysis identified study design as a primary source of heterogeneity. AI algorithms exhibited favorable performance for diagnosing PAS and predicting APO associated with PAS, suggesting the clinical translation potential of AI in enhancing the efficiency of diagnostic workflows and potentially reducing maternal morbidity and mortality.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.