Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

2025 Ginkgo Datapoints Antibody Developability Competition outcomes: limited model performance and a call for data standardization

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

ABSTRACT The Ginkgo Datapoints Antibody Developability (AbDev) Competition, a blinded benchmark for developability prediction characterized entirely on a single, industrial-scale experimental platform, was conducted from September 8 to November 18, 2025. We benchmarked predictors across five biophysical properties – hydrophobicity, thermostability, self-association, expression titer, and polyreactivity – using a public training set of 246 clinical antibodies and a blinded, held-out test set of 80 antibodies. We received submissions from 113 teams spanning 25 countries, 38 companies, and 39 universities. Winning submissions differed by assay. Top Spearman’s ρ values on the test set reached 0.708 (hydrophobicity), 0.392 (thermostability), 0.356 (polyreactivity), 0.337 (self-association), and 0.310 (titer). Cross-validation scores from the public training set consistently exceeded held-out test performance, indicating overfitting and limited out-of-distribution generalization. Together, these results provide a standardized snapshot of current antibody developability modeling capabilities and highlight a key bottleneck: available datasets are too small and heterogeneous to support robust, assay-spanning prediction. Meaningful progress will require larger, standardized, and diverse experimental datasets – with harmonized protocols and rich metadata – to train and validate models that generalize reliably for future antibody discovery campaigns.

Similar Papers
  • Research Article
  • Cite Count Icon 3
  • 10.1016/j.heliyon.2024.e35236
Automated classification of angle-closure mechanisms based on anterior segment optical coherence tomography images via deep learning
  • Jul 26, 2024
  • Heliyon
  • Ye Zhang + 11 more

Automated classification of angle-closure mechanisms based on anterior segment optical coherence tomography images via deep learning

  • Research Article
  • 10.1158/1535-7163.targ-23-b016
Abstract B016: AI analysis of histological images accurately identifies luminal subtype urothelial carcinomas characterized by high PPARG expression
  • Dec 1, 2023
  • Molecular Cancer Therapeutics
  • Stefan Kirov + 12 more

Background PPARG is a cell lineage determining transcription factor in muscle invasive urothelial carcinoma (MIUC), where high expression is associated with the luminal subtype (1, 2). As FX-909(2), a PPARG inverse agonist, enters the clinic, biomarkers that reflect the luminal subtype will reveal patients with the potential to respond to PPARG inhibition. The determination of luminal status is generally performed via RNA-seq and/or multiple immunohistochemistry stains, which are costly and time-consuming. However, MIUC biopsies are routinely stained with hematoxylin and eosin (H&E). Machine learning (ML)-driven analyses of H&E-stained tissue may enable the identification of patients with luminal MIUC and have advantages over the current molecular approach. Methods H&E-stained slides from 367 unique primary MIUC cases from the TCGA BLCA dataset were split into training (70%), validation (15%), and held-out test (15%) sets by preserving the data distribution of patient metadata. A curated retrospective cohort of 42 localized, stage III-IV primary MIUCs was used as an independent test set. Molecular classification as luminal (luminal papillary, luminal, and luminal infiltrated subtypes) or non-luminal (basal-squamous and neuronal subtypes) was performed and used as ground truth(3). Pretrained artifact and tissue segmentation models were deployed on all images to identify artifact-free areas of cancer and cancer-associated stroma. An end-to-end (E2E) additive multiple instance learning model was trained to identify luminal cases using the training set. Top performing model iterations were compared on the validation set, and the optimal iteration was deployed on both test sets. Results We assessed the performance of our E2E model in predicting luminal status using the molecular subtypes derived from Robertson et al. as ground truth(3). The E2E model showed excellent performance when predicting luminal status in the TCGA validation, TCGA test, and independent test sets (AUROC = 0.96, 0.95, and 0.97, respectively). The accuracy in all three cohorts was 89-90%, with a sensitivity of 0.86-0.96, a specificity of 0.82-0.94, and an F1 score of 0.88-0.9. Conclusions We generated a robust ML model that accurately predicts luminal MIUC using H&E-stained slides. Luminal MIUC is dependent on PPARG, and PPARG inverse agonism represents a promising therapeutic approach for MIUC. Coupled with the first-in-class FX-909 therapeutic entering the clinic, the strong performance of our model highlights the potential for its application as a precision biomarker to identify patients with advanced urothelial carcinoma likely to respond to PPARG inhibition.

  • Research Article
  • 10.1161/circ.152.suppl_3.4368814
Abstract 4368814: Explainable AutoML-Derived XGBoost Model for One-Year Mortality Prediction in HFpEF Patients
  • Nov 4, 2025
  • Circulation
  • Fares Alahdab + 4 more

Background: Heart failure with preserved ejection fraction (HFpEF) accounts for approximately half of all heart failure hospitalizations; yet, tools for individualized 1-year mortality risk stratification remain limited. We conducted a retrospective study to develop and validate a machine learning model using genetic programming and automated model search to predict 1-year all-cause mortality in HFpEF patients admitted for decompensated HF. Methods: Electronic medical records from Mayo Clinic (2010-2020) were queried to identify 7,840 adult patients hospitalized with HF exacerbation and a left ventricular ejection fraction>50% on echocardiograms performed between six months pre-admission and seven days post-discharge. Demographic, clinical, laboratory, medication, comorbidity, and imaging data available at the time of index hospitalization were extracted. The cohort was split into 80% training (n=6,272) and 20% held-out test (n=1,568) sets. After standard preprocessing, we performed an evolutionary AutoML search. The final pipeline applied a RobustScaler, two sequential feature-union blocks, and an XGBoost classifier. Model performance on the test set was assessed by area under the receiver operating characteristic curve (AUC), Brier score, accuracy, precision, recall, and F1 score, each with bootstrap-derived 95% confidence intervals (CIs). For interpretability, SHAP (SHapley Additive exPlanations) values were computed using the full scaled training set as background. Results: On the held-out test set, the XGBoost-based model achieved an AUC of 0.7595 (95% CI: 0.7307-0.7870) and a Brier score of 0.1764 (95% CI: 0.1652-0.1890). Accuracy was 0.7406 (95% CI: 0.7143-0.7645), precision 0.6349 (95% CI: 0.5759-0.6942), recall 0.4135 (95% CI: 0.3636-0.4609), and F1 score 0.5008 (95% CI: 0.4529-0.5444). The global SHAP summary plot (Figure 1) ranked age, serum albumin, blood urea nitrogen (BUN), presence of renal failure, and NT-proBNP as the top 5 predictors. Conclusions: In this large HFpEF cohort, an AutoML-derived XGBoost pipeline achieved robust discrimination (AUC≈0.76) and calibration (Brier≈0.18) for 1-year mortality prediction. SHAP-based explanations highlighted the importance of age, nutritional status (albumin), renal biomarkers (BUN, renal failure), and natriuretic peptides (NT-proBNP) as principal drivers of risk. These interpretable machine learning findings may guide personalized risk stratification and identify therapeutic targets in HFpEF.

  • Research Article
  • 10.1080/19420862.2026.2646361
Deep learning assessment of nativeness and pairing likelihood for antibody and nanobody design with AbNatiV2
  • Dec 31, 2026
  • mAbs
  • Aubin Ramon + 6 more

Most immune-system created antibodies balance good binding and stability with low toxicity and self-reactivity. Quantifying the nativeness of a candidate sequence – its likelihood of belonging to natural immune repertoires – has thus emerged as a valuable strategy for hit selection from synthetic libraries, optimization and humanization, and for guiding de novo design toward developable candidates. We previously introduced AbNatiV, a transformer‐based VQ-VAE for nativeness assessment, which proved effective across multiple nanobody engineering tasks. However, AbNatiV1 operated on unpaired sequences, limiting applicability to conventional VH-VL antibodies. Moreover, its performance on nanobody nativeness was constrained by the limited number and diversity of nanobody repertoires available at the time. Here, we sequenced new camelid repertoires, curated additional recent datasets, and present AbNatiV2: an enhanced architecture comprising various models each trained on ≥20 million sequences. AbNatiV2 improves nanobody nativeness classification across held-out and diverse test sets, and more robustly detects nativeness changes upon CDR grafting. We also introduce p-AbNatiV2, a cross-attention model fine-tuned on 3.7 million paired human sequences. p-AbNatiV2 provides residue- and sequence-level humanness for VH/VL pairs and learns pairing-likelihood via noise-contrastive training. On held-out tests, it assigns the native pair a higher score in 74% of cases, substantially outperforming recent pairing models. Together, AbNatiV2 and p-AbNatiV2 extend nativeness assessment and engineering to both nanobodies and conventional antibodies, supporting design decisions at single-residue, Fv-sequence, and paired-domain levels. AbNatiV2 is available as downloadable software and webserver.

  • Research Article
  • Cite Count Icon 4
  • 10.1002/mp.17504
Generalizability of lesion detection and segmentation when ScaleNAS is trained on a large multi-organ dataset and validated in the liver.
  • Nov 22, 2024
  • Medical physics
  • Jingchen Ma + 10 more

Tumor assessment through imaging is crucial for diagnosing and treating cancer. Lesions in the liver, a common site for metastatic disease, are particularly challenging to accurately detect and segment. This labor-intensive task is subject to individual variation, which drives interest in automation using artificial intelligence (AI). Evaluate AI for lesion detection and lesion segmentation using CT in the context of human performance on the same task. Use internal testing to determine how an AI-developed model (ScaleNAS) trained on lesions in multiple organs performs when tested specifically on liver lesions in a dataset integrating real-world and clinical trial data. Use external testing to evaluate whether ScaleNAS's performance generalizes to publicly available colorectal liver metastases (CRLM) from The Cancer Imaging Archive (TCIA). The CUPA study dataset included patients whose CT scan of chest, abdomen, or pelvis at Columbia University between 2010-2020 indicated solid tumors (CUIMC, n=5011) and from two clinical trials in metastatic colorectal cancer, PRIME (n=1183) and Amgen (n=463). Inclusion required ≥1 measurable lesion; exclusion criteria eliminated 1566 patients. Data were divided at the patient level into training (n=3996), validation (n=570), and testing (n=1529) sets. To create the reference standard for training and validation, each case was annotated by one of six radiologists, randomly assigned, who marked the CUPA lesions without access to any previous annotations. For internal testing we refined the CUPA test set to contain only patients who had liver lesions (n=525) and formed an enhanced reference standard through expert consensus reviewing prior annotations. For external testing, TCIA-CRLM (n=197) formed the test set. The reference standard for TCIA-CRLM was formed by consensus review of the original annotation and contours by two new radiologists. Metrics for lesion detection were sensitivity and false positives. Lesion segmentation was assessed with median Dice coefficient, under-segmentation ratio (USR), and over-segmentation ratio (OSR). Subgroup analysis examined the influence of lesion size ≥ 10mm (measurable by RECIST1.1) versus all lesions (important for early identification of disease progression). ScaleNAS trained on all lesions achieved sensitivity of 71.4% and Dice of 70.2% for liver lesions in the CUPA internal test set (3,495 lesions) and sensitivity of 68.2% and Dice 64.2% in the TCIA-CRLM external test set (638 lesions). Human radiologists had mean sensitivity of 53.5% and Dice of 73.9% in CUPA and sensitivity of 84.1% and Dice of 88.4% in TCIA-CRLM. Performance improved for ScaleNAS and radiologists in the subgroup of lesions that excluded sub-centimeter lesions. Our study presents the first evaluation of ScaleNAS in medical imaging, demonstrating its liver lesion detection and segmentation performance across diverse datasets. Using consensus reference standards from multiple radiologists, we addressed inter-observer variability and contributed to consistency in lesion annotation. While ScaleNAS does not surpass radiologists in performance, it offers fast and reliable results with potential utility in providing initial contours for radiologists. Future work will extend this model to lung and lymph node lesions, ultimately aiming to enhance clinical applications by generalizing detection and segmentation across tissue types.

  • Research Article
  • Cite Count Icon 1
  • 10.1016/j.eclinm.2025.103524
Development and validation of an AI foundation model for endoscopic diagnosis of esophagogastric junction adenocarcinoma: a cohort and deep learning study
  • Sep 27, 2025
  • eClinicalMedicine
  • Yikun Ma + 22 more

SummaryBackgroundThe early detection of esophagogastric junction adenocarcinoma (EGJA) is crucial for improving patient prognosis, yet its current diagnosis is highly operator-dependent. This paper aims to make the first attempt to develop an artificial intelligence (AI) foundation model-based method for both screening and staging diagnosis of EGJA using endoscopic images.MethodsIn this cohort and learning study, we conducted a multicentre study across seven Chinese hospitals between December 28, 2016 and December 30, 2024. It comprises 12,302 images from 1546 patients (590 with advanced EGJA, 243 with early EGJA, 713 without EGJA); 8249 of them were employed for model training, while the remaining were divided into the held-out (112 patients, 914 images), external (230 patients, 1539 images), and prospective (198 patients, 1600 images) test sets for evaluation. The proposed model employs DINOv2 (a vision foundation model) and ResNet50 (a convolutional neural network) to extract features of global appearance and local details of endoscopic images for EGJA staging diagnosis. The performance of our model is assessed using accuracy, sensitivity, specificity, positive predictive value, negative predictive value, area under the receiver operating characteristic curve, average precision, and Kappa. 30 endoscopists with varying experience levels were recruited for comparative and AI-assisted evaluations.FindingsOur model demonstrates satisfactory performance for EGJA staging diagnosis across three test sets, achieving an accuracy of 0.9256 (95% CI 0.9086–0.9426), 0.8895 (95% CI 0.8739–0.9052), and 0.8956 (95% CI 0.8813–0.9112), respectively. In contrast, among representative AI models, the best one (ResNet50) achieves an accuracy of 0.9125 (95% CI 0.8942–0.9308), 0.8382 (95% CI 0.8198–0.8566), and 0.8519 (95% CI 0.8345–0.8693) on the three test sets, respectively; the expert endoscopists achieve an accuracy of 0.8147 (95% CI 0.7895–0.8399) on the held-out test set. Statistical analysis reveals that our model significantly outperforms representative AI models and endoscopists (all P < 0.05), with the exception of ResNet50 on the held-out test set (P = 0.54). Moreover, with the assistance of our model, the overall accuracy for the trainee, competent, and expert endoscopists improves from 0.7035 (95% CI 0.6739–0.7331), 0.7350 (95% CI 0.7064–0.7636), and 0.8147 (95% CI 0.7895–0.8399) to 0.8497 (95% CI 0.8265–0.8728), 0.8521 (95% CI 0.8291–0.8751), and 0.8696 (95% CI 0.8478–0.8914), respectively.InterpretationTo our knowledge, our model is the first application of foundation models for EGJA staging diagnosis and demonstrates great potential in both diagnostic accuracy and efficiency. Besides, the results by the developed model can also be visually probed and interpreted, highlighting the clinical benefits in precision therapy. Nevertheless, the study also has limitations such as the regional constraint of the data sources and the restriction to white-light and narrow-band imaging modalities, which may limit the generalizability of the developed model.FundingThis study was supported by the Shanghai Health Development Commission, Shanghai Science and Technology Commission, 10.13039/501100004204Tongji University, and 10.13039/501100008838Shanghai Municipal Commission of Economy and Informatization.

  • Research Article
  • 10.64898/2026.01.20.26344425
Artificial Intelligence-Enabled Echocardiographic Assessment of Right Ventricular Function
  • Jan 22, 2026
  • medRxiv
  • Márton Tokodi + 12 more

Background:Right ventricular (RV) function is an important predictor of morbidity and mortality in various cardiovascular conditions. Nevertheless, its echocardiographic assessment is challenging due to its complex anatomy and location in the chest, resulting in limited inter-observer reproducibility.Objectives:We aimed to develop a novel deep learning model – EchoNet-RV – to segment the RV in apical 4-chamber view (A4C) echocardiographic videos and estimate RV fractional area change (RVFAC).Methods:For training EchoNet-RV, 7,169 expert-annotated A4C echocardiographic videos were used. The model’s performance was evaluated on a held-out internal test set of 1,320 A4C videos and two international external test sets of 3,107 and 1,077 A4C videos from two separate centers. Additionally, the associations between the predicted RVFAC values and the composite endpoint of heart failure hospitalization or all-cause death were also analyzed in the first external test set.Results:EchoNet-RV segmented the RV with Dice coefficients of 0.893 (0.891–0.895), 0.797 (0.796–0.798), and 0.788 (0.785–0.790) and predicted RVFAC with mean absolute errors of 5.795 (5.560–6.031), 5.830 (5.692–5.970), and 6.362 (6.064–6.660) percentage points in the held-out test set and the two external test sets, respectively. In 500 randomly selected videos from the external test sets, EchoNet-RV’s prediction error was significantly lower than the inter-observer variability (p<0.001). Moreover, it identified RVFAC <35% with areas under the receiver operating characteristic curve of 0.859 (0.843–0.876), 0.725 (0.710–0.740), and 0.684 (0.653–0.713) in the three test sets. EchoNet-RV also outperformed two multi-task models, EchoPrime and PanEcho, in estimating RVFAC and identifying RV dysfunction in the external test sets. In the first external test set, predicted RVFAC values were inversely associated with the composite endpoint (adjusted HR: 0.948 [0.917–0.979], p<0.001), independent of age, sex, cardiovascular risk factors, and left ventricular systolic function.Conclusions:EchoNet-RV enables the rapid and automated assessment of RVFAC, with strong potential to become a valuable tool for the echocardiographic evaluation of RV function and disease surveillance.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 51
  • 10.3389/fmolb.2022.960194
Comparison of hydrophobicity scales for predicting biophysical properties of antibodies
  • Aug 31, 2022
  • Frontiers in Molecular Biosciences
  • Franz Waibl + 5 more

While antibody-based therapeutics have grown to be one of the major classes of novel medicines, some antibody development candidates face significant challenges regarding expression levels, solubility, as well as stability and aggregation, under physiological and storage conditions. A major determinant of those properties is surface hydrophobicity, which promotes unspecific interactions and has repeatedly proven problematic in the development of novel antibody-based drugs. Multiple computational methods have been devised for in-silico prediction of antibody hydrophobicity, often using hydrophobicity scales to assign values to each amino acid. Those approaches are usually validated by their ability to rank potential therapeutic antibodies in terms of their experimental hydrophobicity. However, there is significant diversity both in the hydrophobicity scales and in the experimental methods, and consequently in the performance of in-silico methods to predict experimental results. In this work, we investigate hydrophobicity of monoclonal antibodies using hydrophobicity scales. We implement several scoring schemes based on the solvent-accessibility and the assigned hydrophobicity values, and compare the different scores and scales based on their ability to predict retention times from hydrophobic interaction chromatography. We provide an overview of the strengths and weaknesses of several commonly employed hydrophobicity scales, thereby improving the understanding of hydrophobicity in antibody development. Furthermore, we test several datasets, both publicly available and proprietary, and find that the diversity of the dataset affects the performance of hydrophobicity scores. We expect that this work will provide valuable guidelines for the optimization of biophysical properties in future drug discovery campaigns.

  • Research Article
  • 10.2196/70001
Heterogeneity in Effects of Automated Results Feedback After Online Depression Screening: Secondary Machine-Learning Based Analysis of the DISCOVER Trial
  • Aug 21, 2025
  • JMIR AI
  • Matthias Klee + 4 more

BackgroundOnline depression screening tools may increase uptake of evidence-based care and consequently lead to symptom reduction. However, results of the DISCOVER trial suggested no effect of automated results feedback compared with no feedback after online depression screening on depressive symptom reduction six months after screening. Interpersonal variation in symptom representation, health care needs, and treatment preferences may nonetheless have led to differential response to feedback mode on an individual level.ObjectiveThe aim of this study was to examine heterogeneity of treatment effects (HTE), that is, differential responses to two feedback modes (tailored or nontailored) versus no feedback (control) following online depression screening.MethodsWe used causal forests, a machine learning method that applies recursive partitioning to estimate conditional average treatment effects (CATEs). In this secondary data analysis of the DISCOVER trial, eligible participants screened positive for at least moderate depression severity but had not been diagnosed or treated for depression in the preceding year. The primary outcome was heterogeneity in depression severity change, over a and six-month follow up period, measured with the Patient Health Questionnaire-9. Analysis comprised exploration of average treatment effects (ATE), HTE, operationalized with the area under the targeting operator characteristic curve (AUTOC), and differences in ATE when allocating feedback based on predicted CATE. We extracted top predictors of depression severity change, given feedback and explored high-CATE covariate profiles. Prior to analysis, data was split into training and test sets (1:1) to minimize the risk of overfitting and evaluate predictions in held-out test data.ResultsData from 946 participants of the DISCOVER trial without missing data were analyzed. We did not detect HTE; no versus nontailored feedback, AUTOC −0.48 (95% CI −1.62 to 0.67, P=.41); no versus tailored feedback, AUTOC 0.06 (95% CI −1.21 to 1.33, P=.93); and no versus any feedback, AUTOC −0.20 (95% CI −1.30 to 0.89, P=.72). There was no evidence of alteration to the ATE in the test set when allocating feedback (tailored or nontailored) based on the predicted CATE. By examining covariate profiles, we observed a potentially detrimental role of control beliefs, given feedback compared with no feedback.ConclusionsWe applied causal forests to describe higher-level interactions among a broad range of predictors to detect HTE. In absence of evidence for HTE, treatment prioritization based on trained models did not improve ATEs. We did not find evidence of harm or benefit from providing tailored or nontailored feedback after online depression screening regarding depression severity change after six months. Future studies may test whether screening alone prompts behavioral activation and downstream depression severity reduction, considering the observed uniform changes across groups.

  • Research Article
  • 10.31315/telematika.v21i2.13006
Integrating Multiple Machine Learning Models to Predict Heart Failure Risk
  • Jun 20, 2024
  • Telematika
  • Tuahta Hasiholan Pinem + 1 more

The research aims to create and evaluate machine learning models for the prognosis of heart failure based on patient medical information. Various predictive models have been created employing algorithms like logistic regression, decision trees, random forests, K-nearest neighbors, naive Bayes, support vector machines (SVMs), neural networks, and ensemble voting classifiers. The dataset utilized comprises diverse clinical characteristics from patients diagnosed with heart failure. The data underwent division into training and testing sets in an 80:20 ratio. Metrics including accuracy, Cross Validation Score, and ROC_AUC Score score were used to assess the models' performance. The findings reveal that the Voting Classifier, amalgamating the Logistic Regression and Support Vector Classifier models, demonstrated superior performance with an accuracy of 88.04%, a cross-validation score of 91.01%, and a ROC_AUC score of 88.00%. Further scrutiny suggested that blood pressure and cholesterol levels serve as substantial indicators of heart failure. This study presents a notable advancement in the utilization of machine learning models for heart failure prediction by scrutinizing diverse algorithms and pinpointing the most pertinent clinical characteristics. These outcomes hint at the potential for the development of machine learning-driven clinical tools to facilitate early detection and enhance medical interventions.

  • Front Matter
  • Cite Count Icon 45
  • 10.1088/0967-3334/33/9/e01
Signal quality in cardiorespiratory monitoring
  • Aug 17, 2012
  • Physiological Measurement
  • Gari D Clifford + 1 more

This focus issue of Physiological Measurement follows the 38th Annual International Computing in Cardiology (CinC) Conference, hosted in Hangzhou, China in September 2011 by Zhejiang University. Each year, the NIH-sponsored PhysioNet resource (http://physionet.org/) runs an open competition lasting several months, aimed at encouraging the development of solutions to an unsolved or poorly solved problem in biomedicine, in most cases making use of relevant clinical and experimental data provided freely by PhysioNet. Participants in these annual challenges discuss their diverse approaches to the Challenge problems during dedicated scientific sessions at CinC. The topics of these PhysioNet/CinC Challenges range from physiologic signal processing and analysis to forecasting and modelling clinically important events and processes.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 8
  • 10.1007/s10579-023-09650-7
Finnish parliament ASR corpus
  • Mar 27, 2023
  • Language Resources and Evaluation
  • Anja Virkkunen + 3 more

Public sources like parliament meeting recordings and transcripts provide ever-growing material for the training and evaluation of automatic speech recognition (ASR) systems. In this paper, we publish and analyse the Finnish Parliament ASR Corpus, the most extensive publicly available collection of manually transcribed speech data for Finnish with over 3000 h of speech and 449 speakers for which it provides rich demographic metadata. This corpus builds on earlier initial work, and as a result the corpus has a natural split into two training subsets from two periods of time. Similarly, there are two official, corrected test sets covering different times, setting an ASR task with longitudinal distribution-shift characteristics. An official development set is also provided. We developed a complete Kaldi-based data preparation pipeline and ASR recipes for hidden Markov models (HMM), hybrid deep neural networks (HMM-DNN), and attention-based encoder-decoders (AED). For HMM-DNN systems, we provide results with time-delay neural networks (TDNN) as well as state-of-the-art wav2vec 2.0 pretrained acoustic models. We set benchmarks on the official test sets and multiple other recently used test sets. Both temporal corpus subsets are already large, and we observe that beyond their scale, HMM-TDNN ASR performance on the official test sets has reached a plateau. In contrast, other domains and larger wav2vec 2.0 models benefit from added data. The HMM-DNN and AED approaches are compared in a carefully matched equal data setting, with the HMM-DNN system consistently performing better. Finally, the variation of the ASR accuracy is compared between the speaker categories available in the parliament metadata to detect potential biases based on factors such as gender, age, and education.

  • Research Article
  • Cite Count Icon 1
  • 10.4274/dir.2025.242999
Automatic bone age assessment: a Turkish population study.
  • Mar 17, 2025
  • Diagnostic and interventional radiology (Ankara, Turkey)
  • Samet Öztürk + 5 more

Established methods for bone age assessment (BAA), such as the Greulich and Pyle atlas, suffer from variability due to population differences and observer discrepancies. Although automated BAA offers speed and consistency, limited research exists on its performance across different populations using deep learning. This study examines deep learning algorithms on the Turkish population to enhance bone age models by understanding demographic influences. We analyzed reports from Bağcılar Hospital's Health Information Management System between April 2012 and September 2023 using "bone age" as a keyword. Patient images were re-evaluated by an experienced radiologist and anonymized. A total of 2,730 hand radiographs from Bağcılar Hospital (Turkish population), 12,572 from the Radiological Society of North America (RSNA), and 6,185 from the Radiological Hand Pose Estimation (RHPE) public datasets were collected, along with corresponding bone ages and gender information. A random set of 546 radiographs (273 from Bağcılar, 273 from public datasets) was initially randomly split for an internal test set with bone age stratification; the remaining data were used for training and validation. BAAs were generated using a modified InceptionV3 model on 500 × 500-pixel images, selecting the model with the lowest mean absolute error (MAE) on the validation set. Three models were trained and tested based on dataset origin: Bağcılar (Turkish), public (RSNA-RHPE), and a Combined model. Internal test set predictions of the Combined model estimated bone age within less than 6, 12, 18, and 24 months at rates of 44%, 73%, 87%, and 94%, respectively. The MAE was 9.2 months in the overall internal test set, 7 months on the public test set, and 11.5 months on the Bağcılar internal test data. The Bağcılar-only model had an MAE of 12.7 months on the Bağcılar internal test data. Despite less training data, there was no significant difference between the combined and Bağcılar models on the Bağcılar dataset (P > 0.05). The public model showed an MAE of 16.5 months on the Bağcılar dataset, significantly worse than the other models (P < 0.05). We developed an automatic BAA model including the Turkish population, one of the few such studies using deep learning. Despite challenges from population differences and data heterogeneity, these models can be effectively used in various clinical settings. Model accuracy can improve over time with cumulative data, and publicly available datasets may further refine them. Our approach enables more accurate and efficient BAAs, supporting healthcare professionals where traditional methods are time-consuming and variable. The developed automated BAA model for the Turkish population offers a reliable and efficient alternative to traditional methods. By utilizing deep learning with diverse datasets from Bağcılar Hospital and publicly available sources, the model minimizes assessment time and reduces variability. This advancement enhances clinical decision-making, supports standardized BAA practices, and improves patient care in various healthcare settings.

  • Research Article
  • Cite Count Icon 9
  • 10.9734/air/2020/v21i930238
Assessment of the Different Machine Learning Models for Prediction of Cluster Bean (Cyamopsis tetragonoloba L. Taub.) Yield
  • Aug 27, 2020
  • Advances in Research
  • Darshan Jagannath Pangarkar + 3 more

Prediction of crop yield can help traders, agri-business and government agencies to plan their activities accordingly. It can help government agencies to manage situations like over or under production. Traditionally statistical and crop simulation methods are used for this purpose. Machine learning models can be great deal of help. Aim of present study is to assess the predictive ability of various machine learning models for Cluster bean (Cyamopsis tetragonoloba L. Taub.) yield prediction. Various machine learning models were applied and tested on panel data of 19 years i.e. from 1999-2000 to 2017-18 for the Bikaner district of Rajasthan. Various data mining steps were performed before building a model. K- Nearest Nighbors (K-NN), Support Vector Regression (SVR) with various kernels, and Random forest regression were applied. Cross validation was also performed to know extra sampler validity. The best fitted model was chosen based cross validation scores and R2 values. Besides the coefficient of determination (R2), root mean squared error (RMSE), mean absolute error (MAE), and root relative squared error (RRSE) were calculated for the testing set. Support vector regression with linear kernel has the lowest RMSE (23.19), RRSE (0.14), MAE (19.27) values followed by random forest regression and second-degree polynomial support vector regression with the value of gamma = auto. Instead there was a little difference with R2, placing support vector regression first (98.31%), followed by second-degree polynomial support vector regression with value of gamma = auto (89.83%) and second-degree polynomial support vector regression with value of gamma = scale (88.83%). On two-fold cross validation, support vector regression with a linear kernel had the highest cross validation score explaining 71% (+/-0.03) followed by second-degree polynomial support vector regression with a value of gamma = auto and random forest regression. KNN and support vector regression with radial basis function as a kernel function had negative cross validation scores. Support vector regression with linear kernel was found to be the best-fitted model for predicting the yield as it had higher sample validity (98.31%) and global validity (71%).

  • Conference Article
  • Cite Count Icon 6
  • 10.2118/202047-ms
Integration of Petrophysical Log Data with Computational Intelligence for the Development of a Lithology Predictor
  • Oct 26, 2020
  • Syed Muhammad Amir + 3 more

Wrong manual interpretation from the log data about the formation type and other important information can be catastrophic for the company-operator. With Machine-Learning (ML) (a branch of Artificial Intelligence) algorithms, the interpretation of formation type from the log data has been addressed. As a result, we have successfully developed a program able to accurately predict the type of formation. Using the conventional Machine Learning technique of splitting the data into training, validation and test sets, we tried six different ML algorithms to fit with the training part of the data and then verify their prediction accuracy with cross-validation scores and cross-validation predictions which tests the performance of the classifiers (ML algorithms) on the validation set. The three best performing classifiers were selected and further improved by a search of classifier's best hyperparameters. These improved classifiers are further tested on unseen data to produce a comparative analysis. Our prediction accuracy with Receiver Operating Characteristic (ROC) scores and ROC-Area Under-the-Curve (ROC-AUC) for each type of formation from the log data lies in the range of 95-99%, except for formations such as shaly sandstone and shale (50% and 84% respectively). The reason for this seemed to be under-fitting i.e., during the training, the classifiers did not see enough instances of these types of formation to know exactly what characteristics of the data make the type of formation to be shaly sandstone or shale. The issue of under-fitting was verified by skimming through the data. To resolve this problem, we suggest training classifiers with a larger data with more targets (types of formation). Furthermore, during the data cleaning (prior to classifier training) and data analysis phases we have discovered important relationships between well logs and defined relative importance of each well log for different formations. This observation can be investigated further to help eliminate the use of multiple well logs while dealing with some formations (based on prior geological knowledge) and reduce the cost of the well logging operations. Using our program with a larger well log data consisting of more formation type instances, we can train the classifiers to accurately predict the formation type irrespectively of differences in formation type. Our program is dynamic in the sense that with different targets, i.e., type of formation fluid instead of type of formation or both together, it can successfully predict either or both targets. Increasing the numbers of data instances resulted in a better training and thus, more accurate predictions. Utilization of the program will make the formation-evaluation process easier, faster, automated and more-precise.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant