Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Getting Started with Artificial Intelligence Models

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Getting Started with Artificial Intelligence Models

Similar Papers
  • Research Article
  • Cite Count Icon 10
  • 10.1016/j.jacr.2021.06.025
Real-World Surveillance of FDA-Cleared Artificial Intelligence Models: Rationale and Logistics.
  • Feb 1, 2022
  • Journal of the American College of Radiology
  • Keith J Dreyer + 2 more

Real-World Surveillance of FDA-Cleared Artificial Intelligence Models: Rationale and Logistics.

  • Research Article
  • Cite Count Icon 14
  • 10.1101/2025.03.14.25323836
Performance of DeepSeek, Qwen 2.5 MAX, and ChatGPT Assisting in Diagnosis of Corneal Eye Diseases, Glaucoma, and Neuro-Ophthalmology Diseases Based on Clinical Case Reports.
  • Mar 17, 2025
  • medRxiv : the preprint server for health sciences
  • Zain S Hussain + 12 more

This study evaluates the diagnostic performance of several AI models, including Deepseek, in diagnosing corneal diseases, glaucoma, and neuro□ophthalmologic disorders. We retrospectively selected 53 case reports from the Department of Ophthalmology and Visual Sciences at the University of Iowa, comprising 20 corneal disease cases, 11 glaucoma cases, and 22 neuro□ophthalmology cases. The case descriptions were input into DeepSeek, ChatGPT□4.0, ChatGPT□01, and Qwens 2.5 Max. These responses were compared with diagnoses rendered by human experts (corneal specialists, glaucoma attendings, and neuro□ophthalmologists). Diagnostic accuracy and interobserver agreement, defined as the percentage difference between each AI model's performance and the average human expert performance, were determined. DeepSeek achieved an overall diagnostic accuracy of 79.2%, with specialty-specific accuracies of 90.0% in corneal diseases, 54.5% in glaucoma, and 81.8% in neuro□ophthalmology. ChatGPT□01 outperformed the other models with an overall accuracy of 84.9% (85.0% in corneal diseases, 63.6% in glaucoma, and 95.5% in neuro□ophthalmology), while Qwens exhibited a lower overall accuracy of 64.2% (55.0% in corneal diseases, 54.5% in glaucoma, and 77.3% in neuro□ophthalmology). Interobserver agreement analysis revealed that in corneal diseases, DeepSeek differed by -3.3% (90.0% vs 93.3%), ChatGPT□01 by -8.3%, and Qwens by -38.3%. In glaucoma, DeepSeek outperformed the human expert average by +3.0% (54.5% vs 51.5%), while ChatGPT□4.0 and ChatGPT□01 exceeded it by +12.1%, and Qwens was +3.0% above the human average. In neuro□ophthalmology, DeepSeek and ChatGPT□4.0 were 9.1% lower than the human average, ChatGPT□01 exceeded it by +4.6%, and Qwens was 13.6% lower. ChatGPT□01 demonstrated the highest overall diagnostic accuracy, especially in neuro□ophthalmology, while DeepSeek and ChatGPT□4.0 showed comparable performance. Qwens underperformed relative to the other models, especially in corneal diseases. Although these AI models exhibit promising diagnostic capabilities, they currently lag behind human experts in certain areas, underscoring the need for a collaborative integration of clinical judgment. This study evaluated how well several artificial intelligence (AI) models diagnose eye diseases compared to human experts. We tested four AI systems across three types of eye conditions: diseases of the cornea, glaucoma, and neuro-ophthalmologic disorders. Overall, one AI model, ChatGPT-01, performed the best, correctly diagnosing about 85% of cases, and it excelled in neuro-ophthalmology by correctly diagnosing 95.5% of cases. Two other models, DeepSeek and ChatGPT-4.0, each achieved an overall accuracy of around 79%, while the Qwens model performed lower, with an overall accuracy of about 64%. When compared with human experts, who achieved very high accuracy in corneal diseases (93.3%) and neuro-ophthalmology (90.9%) but lower in glaucoma (51.5%), the AI models showed mixed results. In glaucoma, for instance, some AI models even outperformed human experts slightly, while in corneal diseases, all AI models were less accurate than the experts. These findings indicate that while AI shows promise as a supportive tool in diagnosing eye conditions, it still needs further improvement. Combining AI with human clinical judgment appears to be the best approach for accurate eye disease diagnosis. Why carry out this study? With the rising burden of eye diseases and the inherent diagnostic challenges for complex conditions like glaucoma and neuro-ophthalmologic disorders, there is an unmet need for innovative diagnostic tools to support clinical decision-making. What did the study ask? This study evaluated the diagnostic performance of four AI models across three ophthalmologic subspecialties, testing the hypothesis that advanced language models can achieve accuracy levels comparable to human experts. What was learned from the study? Our results showed that ChatGPT-01 achieved the highest overall accuracy (84.9%), excelling in neuro-ophthalmology with a 95.5% accuracy, while DeepSeek and ChatGPT-4.0 each achieved 79.2%, and Qwens reached 64.2%. What specific outcomes were observed? In glaucoma, AI model accuracies ranged from 54.5% to 63.6%, with some models slightly surpassing the human expert average of 51.5%, underscoring the diagnostic difficulty of this condition. What has been learned and future implications? These findings highlight the potential of AI as a valuable adjunct to clinical judgment in ophthalmology, although further research and the integration of multimodal data are essential to optimize these tools for routine clinical practice.

  • Research Article
  • Cite Count Icon 3
  • 10.1111/cts.70353
A Comparison of AI and Population PK Models to Predict the Concentrations of Antiepileptic Drugs Using Therapeutic Drug Monitoring Records
  • Oct 23, 2025
  • Clinical and Translational Science
  • Tae Kyu Chung + 1 more

ABSTRACTPopulation pharmacokinetic (PK) models are commonly used to predict drug concentrations, but artificial intelligence (AI) models have gained interest due to their ability to identify complex patterns without requiring mathematical assumptions. This study compares the predictive performance of AI and population PK models using therapeutic drug monitoring (TDM) records of four antiepileptic drugs (AEDs): carbamazepine (CBZ), phenobarbital (PHB), phenytoin (PHE), and valproic acid (VPA). Additionally, we analyzed key covariates influencing drug concentration predicting using the most accurate model. We extracted concentration data for CBZ, PHB, PHE, and VPA from TDM reports at Seoul National University Hospital (2010–2021), along with patient diagnoses and lab results. The predictive performances of 10 AI models, including ensemble and deep learning models, were compared with published population PK models. The predictive performance of AI models generally exceeded that of population PK models. The best‐performing AI models, such as Adaboost, eXtreme Gradient Boosting, and Random Forest, had lower root mean squared error values for CBZ, PHB, PHE, and VPA (2.71, 27.45, 4.15, and 13.68 μg/mL, respectively) compared to population PK models (3.09, 26.04, 16.12, and 25.02 μg/mL). The most influential covariate was time after last drug administration. AI models, particularly ensemble methods, showed strong predictive performance and may support individualized AED dosing, improving therapeutic outcomes while minimizing adverse effects.

  • Research Article
  • 10.1117/1.jmm.24.4.044201
Relating human and AI-based detection limits in scanning electron microscopy dimensional metrology
  • Jan 1, 2025
  • Journal of micro/nanolithography, MEMS, and MOEMS : JM3
  • Peter Bajcsy + 2 more

Background:Nanoscale measurements of critical dimensions in semiconductor manufacturing rely on scanning electron microscopy (SEM) and SEM image analyses. The acquisition of SEM images requires a low primary electron beam current and a low dose of the SEM imaging microscope to avoid integrated circuit (IC) sample charging and inflicted damage to sensitive IC structures. These requirements inevitably result in noisy, low-contrast images, which can make traditional SEM image analyses no longer viable.Aim:With the advancement in computational hardware and artificial intelligence (AI) models, IC structure detection via AI-based SEM image segmentation can extend the viability of these measurements from noisy, low-contrast images. However, the use of AI models raises questions about the detection limits of extracted measurements and the confidence in such measurements.Approach:Our approach is to relate SEM image quality characteristics with AI-based object segmentation accuracy to establish detection limits of AI-based models and their relationships to human detection limits. Using SEM image simulation software, we create six image sets of quasi-circular objects on a substrate with varying noise and contrast characteristics. These sets of SEM images are characterized by 25 image quality metrics and then used to train and evaluate three AI models. The 25 SEM image quality characteristics and three AI model accuracy metrics per SEM image define the mapping between the quality of input SEM images and the performance of the trained AI models.Results:We used the mapping to establish the detection limits of trained AI models with respect to a required confidence and then relate the human detection limits to the trained AI model detection limits. The human detection limit was established by Rose as the minimal signal-to-noise ratio (SNR = 5) to reliably delineate the shape and size of objects in an image. We matched the signal-to-noise ratio (SNR) defined by Rose to image quality characteristics and demonstrated the upper and lower SNR bounds for three AI models with respect to the human detection limit and for a specified confidence.Conclusion:We establish a method to determine the detection limits of AI-based SEM dimensional metrology. The study is relevant to semiconductor vendors and consumers of AI models because critical dimension measurements are derived from noisy and low-contrast SEM images using AI models with varying performance characteristics. Given a measured SEM image with estimated noise and contrast characteristics, each AI model will be characterized by unique detection limits that can be trusted in semiconductor production. Our method enables improving the trust in critical dimensions while using advanced AI models.

  • Research Article
  • 10.1158/1538-7445.advbc23-b078
Abstract B078: Artificial-intelligence-driven breast density assessment in the transition from full-field digital mammograms to digital breast tomosynthesis
  • Feb 1, 2024
  • Cancer Research
  • Krisha Anant + 3 more

Introduction: To enhance reproducibility and robustness in mammographic density assessment, various artificial intelligence (AI) models have been proposed to automatically classify mammographic images into BI-RADS density categories. Despite their promising performances, so far density AI models have been assessed primarily in traditional full-field digital mammography (FFDM) images. Our study aims to assess the potential of AI in breast density assessment in FFDM versus the newer synthetic mammography (SM) images acquired with digital breast tomosynthesis. Methods: We retrospectively analyzed negative (BI-RADS 1 or 2) routine mammographic screening exams (Selenia or Selenia Dimensions; Hologic) acquired at sites within the Barnes-Jewish/Christian (BJC) Healthcare network in St. Louis, MO from 2015 to 2018. BI-RADS breast density assessments of radiologists were obtained from BJC’s mammography reporting software (Magview 7.1). For each mammographic imaging modality, a balanced dataset of 4,000 women was selected so there were equal numbers of women in each of the four BI-RADS density categories, and each woman had at least one mediolateral oblique (MLO) and one craniocaudal (CC) view per breast in that mammographic imaging modality. Previously validated pre-processing steps were applied to all FFDM and SM images to standardize image orientation and intensity. Images were then split into training, validation, and test sets at ratios of 80%, 10%, and 10%, respectively, while maintaining the distribution of breast density categories and ensuring that all images of the same woman appear only in one set. Our AI model was based on the widely used ResNet50 architecture and was designed to accept as an input a mammographic image and predict the BI-RADS breast density category that the image belongs to. Our AI model was optimized, trained, and evaluated separately for each mammographic imaging modality. We report on the AI model’s predictive accuracy on the test set for each mammographic imaging modality, for both views as well as separately for CC and MLO; accuracy differences in FFDM versus SM were assessed via bootstrapping. Results: A batch size of 32, learning rate of e-6, and Adam optimizer were chosen as the optimal hyperparameters for our AI model. Using the same hyperparameters, the AI model demonstrated substantially higher accuracy on the test set for FFDM than for SM (FFDM: accuracy = 71% ± 4.5% versus SM: accuracy = 66% ± 4.2%; p-value<0.001 for comparison). Similar conclusion held when CC and MLO views were evaluated separately (accuracy = 72% ± 4.6% versus 66% ± 4.3% for CC; accuracy = 69% ± 4.5% versus 62% ± 4.3% for MLO; p-value<0.001 for both comparisons). Conclusions: AI performance in BI-RADS breast density assessment was significantly higher on FFDM versus SM, even under the same AI model design, dataset size and training process. Our preliminary findings suggest that further AI optimizations and adaptations may be needed as we translate AI models from FFDM to the newer SM format acquired with digital breast tomosynthesis. Citation Format: Krisha Anant, Juanita Hernandez Lopez, Debbie Bennett, Aimilia Gastounioti. Artificial-intelligence-driven breast density assessment in the transition from full-field digital mammograms to digital breast tomosynthesis [abstract]. In: Proceedings of the AACR Special Conference in Cancer Research: Advances in Breast Cancer Research; 2023 Oct 19-22; San Diego, California. Philadelphia (PA): AACR; Cancer Res 2024;84(3 Suppl_1):Abstract nr B078.

  • Research Article
  • 10.1007/s00330-026-12526-3
Mammography-based artificial intelligence model for predicting axillary lymph node status after neoadjuvant therapy in breast cancer.
  • Apr 18, 2026
  • European radiology
  • Keyu Mao + 10 more

Our objective is to develop a deep learning-based artificial intelligence (AI) model capable of analyzing digital mammography (DM) images to predict axillary lymph node (ALN) status subsequent to neoadjuvant therapy (NAT) in breast cancer patients. We developed and validated an AI model for predicting post-NAT ALN status using images and clinical data of 956 invasive non-specific breast cancer patients with positive ALN metastasis from three medical centers. During development, four image cropping methods and five backbone networks were compared for classification architecture construction. The AI model was evaluated via internal and external test sets, with performance assessed using the ROC curve and AUC. Experiments showed that the AI model using "fixed 5 cm" image clipping and Swin Transformer V2 as the backbone feature extraction network for primary image processing achieved the best ALN status prediction performance. Compared with merely inputting the primary lesion, adding the pre-training model and clinical features further improved the prediction performance of the AI model, in the training set (AUC = 0.823, 95% CI: 0.797-0.846, p < 0.001), internal validation set (AUC = 0.774, 95% CI: 0.722-0.818, p < 0.001), internal test set (AUC = 0.778, 95% CI: 0.739-0.813, p = 0.034) and external test set (AUC = 0.756, 95% CI: 0.700-0.805, p = 0.013). After inputting primary and auxiliary region images and clinical features into the AI model, the AUC value was further improved, reaching above 0.8 in all four datasets. This study constructed an AI model based on baseline DM images that demonstrates good performance in predicting ALN status in breast cancer patients after NAT, providing decision support to avoid excessive surgery. Question Due to the lack of reliable methods to accurately judge the status of ALNs in breast cancer patients after NAT, some patients are overtreated. Findings The AI model we constructed based on the primary lesion of DM before NAT can predict the status of ALNs accurately after NAT. Clinical relevance The AI model can predict the status of ALNs after NAT, which may help clinical selection of more beneficial treatment modalities.

  • Research Article
  • Cite Count Icon 1
  • 10.54364/aaiml.2024.43159
Predicting Mandibular Bone Growth Using Artificial Intelligence and Machine Learning: A Systematic Review
  • Jan 1, 2024
  • Advances in Artificial Intelligence and Machine Learning
  • Mahmood Dashti + 6 more

Introduction The accurate prediction of mandibular bone growth is crucial in orthodontics and maxillofacial surgery, impacting treatment planning and patient outcomes. Traditional methods often fall short due to their reliance on linear models and clinician expertise, which are prone to human error and variability. Artificial intelligence (AI) and machine learning (ML) offer advanced alternatives, capable of processing complex datasets to provide more accurate predictions. This systematic review examines the efficacy of AI and ML models in predicting mandibular growth compared to traditional methods. Method. A systematic review was conducted following the PRISMA guidelines, focusing on studies published up to July 2024. Databases searched included PubMed, Embase, Scopus, and Web of Science. Studies were selected based on their use of AI and ML algorithms for predicting mandibular growth. A total of 31 studies were identified, with 6 meeting the inclusion criteria. Data were extracted on study characteristics, AI models used, and prediction accuracy. The risk of bias was assessed using the QUADAS-2 tool. Results. The review found that AI and ML models generally provided high accuracy in predicting mandibular growth. For instance, the LASSO model achieved an average error of 1.41 mm for predicting skeletal landmarks. However, not all AI models outperformed traditional methods; in some cases, deep learning models were less accurate than conventional growth prediction models. Discussion. The variability in datasets and study designs across the included studies posed challenges for comparing AI models’ effectiveness. Additionally, the complexity of AI models may limit their clinical applicability. Despite these challenges, AI and ML show significant promise in enhancing predictive accuracy for mandibular growth. Conclusion. AI and ML models have the potential to revolutionize mandibular growth prediction, offering greater accuracy and reliability than traditional methods. However, further research is needed to standardize methodologies, expand datasets, and improve model interpretability for clinical integration.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 12
  • 10.3389/fmed.2022.934865
Patient selection for corneal topographic evaluation of keratoconus: A screening approach using artificial intelligence
  • Aug 4, 2022
  • Frontiers in Medicine
  • Hyunmin Ahn + 6 more

BackgroundCorneal topography is a clinically validated examination method for keratoconus. However, there is no clear guideline regarding patient selection for corneal topography. We developed and validated a novel artificial intelligence (AI) model to identify patients who would benefit from corneal topography based on basic ophthalmologic examinations, including a survey of visual impairment, best-corrected visual acuity (BCVA) measurement, intraocular pressure (IOP) measurement, and autokeratometry.MethodsA total of five AI models (three individual models with fully connected neural network including the XGBoost, and the TabNet models, and two ensemble models with hard and soft voting methods) were trained and validated. We used three datasets collected from the records of 2,613 patients' basic ophthalmologic examinations from two institutions to train and validate the AI models. We trained the AI models using a dataset from a third medical institution to determine whether corneal topography was needed to detect keratoconus. Finally, prospective intra-validation dataset (internal test dataset) and extra-validation dataset from a different medical institution (external test dataset) were used to assess the performance of the AI models.ResultsThe ensemble model with soft voting method outperformed all other AI models in sensitivity when predicting which patients needed corneal topography (90.5% in internal test dataset and 96.4% in external test dataset). In the error analysis, most of the predicting error occurred within the range of the subclinical keratoconus and the suspicious D-score in the Belin-Ambrósio enhanced ectasia display. In the feature importance analysis, out of 18 features, IOP was the highest ranked feature when comparing the average value of the relative attributions of three individual AI models, followed by the difference in the value of mean corneal power.ConclusionAn AI model using the results of basic ophthalmologic examination has the potential to recommend corneal topography for keratoconus. In this AI algorithm, IOP and the difference between the two eyes, which may be undervalued clinical information, were important factors in the success of the AI model, and may be worth further reviewing in research and clinical practice for keratoconus screening.

  • Preprint Article
  • 10.5194/egusphere-egu25-166
Impact of using additional precipitation data from the uppermost region on improving the performance of AI models in predicting groundwater levels
  • Mar 18, 2025
  • Mun-Ju Shin + 6 more

Groundwater is an important water resource that is widely used worldwide for agricultural, industrial, and domestic purposes. In the case of Jeju Island, located in southern South Korea, groundwater is an indispensable water resource that accounts for 82% of the total water supply. Therefore, scientific prediction and management of groundwater levels are very important for the sustainable use of groundwater by citizens. This study additionally used precipitation data from the Baekrokdam Climate Change Observatory located on the summit of Jeju Island in artificial intelligence (AI) models to accurately predict one-month-ahead future groundwater levels for the mid-mountainous areas of Jeju Island, where groundwater levels are highly variable. In other words, the AI models compared and analyzed the improvement effect of the monthly groundwater level prediction performance for 1) using precipitation data from two rainfall stations, groundwater withdrawal data from two groundwater sources, and groundwater level data from two monitoring wells in the study area, and 2) adding precipitation data from Baekrokdam Climate Change Observatory. The study subjects are two groundwater level monitoring wells located at 435-471m above mean sea level in the southeast of Jeju Island. The AI models used to predict groundwater levels are Artificial Neural Network (ANN) and Long Short-Term Memory (LSTM), a deep learning AI model.As a result, when the Baekrokdam precipitation data were not used, the two AI models showed excellent groundwater level prediction performance with Nash-Sutcliffe efficiency (NSE) values of 0.871 or higher. The LSTM model showed relatively higher prediction performance for high and low groundwater levels than the ANN model. This means that the LSTM model adequately incorporates the seasonal effects of wet and dry periods into groundwater level simulations. The more volatile the observed groundwater level, the more difficult it is for the AI models to interpret the characteristics of groundwater level fluctuations, and the lower the performance of predicting future groundwater levels. When additional Baekrokdam precipitation data were used, the two AI models showed improved groundwater level prediction performance by having NSE values of 0.907 or higher. This means that the additional use of precipitation data located in the uppermost region provides more information to help interpret groundwater levels, allowing AI models to better interpret the characteristics of groundwater level fluctuations. In addition, the use of Baekrokdam precipitation data was more helpful in improving groundwater level prediction for the monitoring well, which has highly variable groundwater levels that are difficult to predict, and the ANN model with relatively low groundwater level prediction performance. When additional Baekrokdam precipitation data was used for a specific monitoring well, the groundwater level prediction performance of the ANN model was improved to a level comparable to that of the LSTM model, which is a deep learning AI, even with a relatively simple ANN model structure. This is an example of how important it is to use additional useful data in research using AI models.

  • Research Article
  • Cite Count Icon 7
  • 10.1055/a-2121-8380
Doctors Identify Hemorrhage Better during Chart Review when Assisted by Artificial Intelligence.
  • Aug 1, 2023
  • Applied Clinical Informatics
  • Martin S Laursen + 5 more

This study evaluated if medical doctors could identify more hemorrhage events during chart review in a clinical setting when assisted by an artificial intelligence (AI) model and medical doctors' perception of using the AI model. To develop the AI model, sentences from 900 electronic health records were labeled as positive or negative for hemorrhage and categorized into one of 12 anatomical locations. The AI model was evaluated on a test cohort consisting of 566 admissions. Using eye-tracking technology, we investigated medical doctors' reading workflow during manual chart review. Moreover, we performed a clinical use study where medical doctors read two admissions with and without AI assistance to evaluate performance when using and perception of using the AI model. The AI model had a sensitivity of 93.7% and a specificity of 98.1% on the test cohort. In the use studies, we found that medical doctors missed more than 33% of relevant sentences when doing chart review without AI assistance. Hemorrhage events described in paragraphs were more often overlooked compared with bullet-pointed hemorrhage mentions. With AI-assisted chart review, medical doctors identified 48 and 49 percentage points more hemorrhage events than without assistance in two admissions, and they were generally positive toward using the AI model as a supporting tool. Medical doctors identified more hemorrhage events with AI-assisted chart review and they were generally positive toward using the AI model.

  • Research Article
  • 10.1093/humrep/deae108.541
P-170 Clinical evaluation of an image-based artificial intelligence model for embryo selection: a double-blinded randomized comparative reader study
  • Jul 3, 2024
  • Human Reproduction
  • V Suraj + 5 more

Study question What is the performance of an image-based artificial intelligence (AI) model for ranking blastocyst stage embryos compared to embryologists using traditional morphology? Summary answer The AI was non-inferior to manual embryo selection. The AI showed significant improvement in clinical pregnancy when there was disagreement between AI and manual selection. What is known already In previous work, we developed an image-based AI model that predicts the likelihood of clinical pregnancy by analyzing a single static image of a blastocyst captured prior to biopsy or freeze. This model was trained on data from over 8,000 single-blastocyst transfer cycles from multiple U.S. IVF clinics performed between 2014 to 2021. Study design, size, duration We performed a retrospective, double-blinded, comparative reader study. The study included data from 438 single-blastocyst transfers from 10 different IVF clinics in U.S. that were not part of previous model development or testing. Using this data, a set of 1,257 virtual patient panels were created. Each virtual patient panel included between 2-5 embryos that were matched by age (18 - 29, 30 - 34, 35 - 37, ≥38), race (white, non-white, and unknown) and PGT-status. Participants/materials, setting, methods A group of 5 embryologists (readers) with varying levels of experience were asked to select their top embryo for transfer for each virtual patient panel (control arm) based on morphology grades. The AI model was also used to select a top embryo for transfer from each patient panel (treatment arm). The clinical pregnancy rates of the top-selected embryos were calculated and compared. Main results and the role of chance There was disagreement on the top pick embryo amongst the five embryologists 34.6% of the time, which increased to 43.9% when there were 3 or more embryos to choose from, supporting the need for a tool to standardize this decision. The clinical pregnancy rate of the control arm (average of embryologist readers) was 61.0% (individual rates of 58.9%, 59.6%, 61.5%, 61.6%, and 63.3%), and the clinical pregnancy rate of the treatment arm (AI model) was 62.3% (demonstrating non-inferiority with p&amp;lt;.001). The pregnancy rate of random embryo selection was 53.2%. All 5 of the embryologist readers agreed on the top-pick embryo 65% of the time. When all 5 readers agreed, the AI model disagreed with that consensus 31% of the time, and in these cases the AI model pregnancy rate was significantly higher by 8.6% (AI: 63.1%, Embryologist Consensus: 54.5% (p &amp;lt; 0.05)). Limitations, reasons for caution While data for this study was collected prospectively, the analysis was done retrospectively. Furthermore, readers were provided morphology grades from retrospective data rather than grading the embryos themselves. However, it is common for an embryologist to make an embryo selection based on morphology grades previously assigned by a different embryologist. Wider implications of the findings The AI model was able to select the top embryo for transfer with performance comparable to experienced embryologists. Such a model could allow for automated and objective embryo selection using a single static image of blastocyst stage embryos. Trial registration number NA

  • Research Article
  • Cite Count Icon 4
  • 10.1093/jas/skae234.078
6 Transforming animal agriculture through hybrid modeling and quantum computing
  • Sep 13, 2024
  • Journal of Animal Science
  • Luis O Tedeschi

Quantum computing (QC) is not a futuristic notion in agriculture, though its full potential has yet to be realized. QC is an emerging field at the intersection of physics and computer science that holds immense potential to revolutionize various sectors, including agriculture production and artificial intelligence (AI) modeling. While QC is still in the early stages of development and practical applications within agriculture are not yet widespread, researchers are actively exploring its potential benefits in various agricultural domains, including crop optimization, livestock breeding, and environmental monitoring. QC harnesses the principles of quantum mechanics to perform computations using quantum bits or qubits, which can exist in multiple states simultaneously. Unlike classical computers, which rely on binary bits representing 0 or 1, quantum computers exploit phenomena such as superposition and entanglement to process information in parallel, potentially offering exponential speedup for certain types of problems. In agriculture production, particularly in animal science, QC offers promising avenues for optimizing processes and enhancing productivity. Quantum algorithms can analyze vast amounts of genomic data to improve breeding programs, leading to the development of more resilient and productive livestock breeds. Furthermore, QC can facilitate precision farming techniques by modeling complex environmental factors and animal behavior to optimize feeding strategies, disease management, and overall farm management practices. Moreover, QC can significantly benefit AI modeling by accelerating computations and enabling more efficient training of AI models. Quantum algorithms can enhance the performance of AI algorithms in various tasks, including pattern recognition, natural language processing, and predictive analytics. By leveraging quantum-enhanced optimization algorithms, AI models can achieve better convergence and accuracy, leading to more effective decision-making and problem-solving capabilities. While hybrid intelligent models also represent a novel frontier in agriculture, QC has the potential to expedite the merging of mechanistic and AI modeling paradigms, facilitating a more holistic understanding of complex systems in agriculture and beyond. By integrating mechanistic models, which describe the underlying physical processes, with AI models, which learn patterns from data, quantum computing can enable comprehensive simulations and predictions of agricultural systems. This fusion of modeling paradigms can lead to more accurate and robust predictions of crop yields, livestock performance, and environmental impacts, facilitating informed decision-making for farmers and policymakers. The application of QC in agriculture, however, requires interdisciplinary collaborations between physicists, computer scientists, agronomists/animal scientists, and AI researchers. These collaborations can drive the development of quantum algorithms tailored to agricultural applications, the integration of quantum-enhanced AI techniques into existing modeling frameworks, and the deployment of QC resources in real-world agricultural systems. Ultimately, harnessing the power of QC holds the potential to revolutionize agriculture production practices, including regenerative agriculture, and advance AI modeling capabilities, paving the way for a more sustainable and efficient agricultural industry.

  • Research Article
  • Cite Count Icon 1
  • 10.3390/educsci15040403
Evaluating an Artificial Intelligence (AI) Model Designed for Education to Identify Its Accuracy: Establishing the Need for Continuous AI Model Updates
  • Mar 23, 2025
  • Education Sciences
  • Navdeep Verma + 3 more

The growing popularity of online learning brings with it inherent challenges that must be addressed, particularly in enhancing teaching effectiveness. Artificial intelligence (AI) offers potential solutions by identifying learning gaps and providing targeted improvements. However, to ensure their reliability and effectiveness in educational contexts, AI models must be rigorously evaluated. This study aimed to evaluate the performance and reliability of an AI model designed to identify the characteristics and indicators of engaging teaching videos. The research employed a design-based approach, incorporating statistical analysis to evaluate the AI model’s accuracy by comparing its assessments with expert evaluations of teaching videos. Multiple metrics were employed, including Cohen’s Kappa, Bland–Altman analysis, the Intraclass Correlation Coefficient (ICC), and Pearson/Spearman correlation coefficients, to compare the AI model’s results with those of the experts. The findings indicated low agreement between the AI model’s assessments and those of the experts. Cohen’s Kappa values were low, suggesting minimal categorical agreement. Bland–Altman analysis showed moderate variability with substantial differences in results, and both Pearson and Spearman correlations revealed weak relationships, with values close to zero. The ICC indicated moderate reliability in quantitative measurements. Overall, these results suggest that the AI model requires continuous updates to improve its accuracy and effectiveness. Future work should focus on expanding the dataset and utilise continual learning methods to enhance the model’s ability to learn from new data and improve its performance over time.

  • Research Article
  • 10.1111/eje.70186
Comparing the Performance of Different Artificial Intelligence Tools in Evaluating Dental Morphology Model Assignments.
  • May 8, 2026
  • European journal of dental education : official journal of the Association for Dental Education in Europe
  • Ayşegül Hazir + 2 more

This study aimed to compare scores obtained for evaluating maxillary left canine tooth models prepared from soap in a dental morphology course using different artificial intelligence (AI) models and dental educators with the same rubric, and to evaluate the feedback generated by the AI models. Assignment models prepared by students were scored by ChatGPT 5.2, Gemini 3 Pro, and Grok 4.1 AI tools, and by dental educators using the same evaluation criteria. The quality of feedback generated by AI models was evaluated by experts using the Global Quality Scale (GQS). Data were analysed using SPSS v27.0, and normality was assessed using the Shapiro-Wilk test. Statistical differences between the three AI tools and expert scores were examined using the Friedman Test and Bonferroni-corrected multiple comparisons, and agreement among evaluators was assessed using Kendall's W coefficient. A significant difference was found between the AI models and expert ratings (p < 0.001), with all AI models receiving higher scores than the experts. Significant differences were also found among the AI models' GQS scores (p < 0.001); Gemini 3 Pro produced the highest feedback quality, while ChatGPT 5.2 produced the lowest. AI models can be used as supportive tools in the assessment and feedback processes in dental education; however, in terms of contextual awareness and personalised feedback, they are not yet at a level to replace expert evaluations.

  • Research Article
  • Cite Count Icon 3
  • 10.1016/j.ekir.2025.05.035
Chronic Changes on Kidney Histology by a Multiclass Artificial Intelligence Model
  • May 29, 2025
  • Kidney International Reports
  • Aleksandar Denic + 12 more

IntroductionChronic changes in kidney histology are often approximated by using human vision but with limited accuracy.MethodsAn interactive annotation tool trained an artificial intelligence (AI) model for segmenting structures on whole slide images (WSIs) of kidney tissue. A total of 20,509 annotations trained the AI model with 20 classes of structures, including separate detection of cortex from medulla. We compared the AI model detections with human-based annotations in an independent validation set. The AI model was then applied to 1426 donors and 1699 patients with renal tumor to calculate chronic changes as defined by measures of nephron size (glomerular volume, cortex volume per glomerulus, and mean tubular areas) and nephrosclerosis (globally sclerotic glomeruli, increased interstitium, increased tubular atrophy (TA), arteriolar hyalinosis (AH), and artery luminal stenosis from intimal thickening). We then assessed whether chronic kidney disease (CKD) outcomes were associated with these chronic changes.ResultsDuring the AI model validation step, the agreement between the AI detections and human annotations was similar to the agreement between human pairs, except that the AI model showed less agreement with AH. Chronic changes calculated solely from AI-based detections associated with low glomerular filtration rate (GFR) during follow-up after kidney donation and with kidney failure after a radical nephrectomy for tumor. A chronicity score based on AI detections was calculated from cortex per glomerulus, percent glomerulosclerosis, TA foci density, and mean area of AH lesions and showed good prognostic discrimination for kidney failure (cross-validation C-statistic = 0.819).ConclusionA multiclass AI model can help automate quantification of chronic changes on WSIs of kidney histology.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant