Published in last 50 years
Articles published on Classification Model
- New
- Research Article
- 10.1002/cphc.202500501
- Nov 8, 2025
- Chemphyschem : a European journal of chemical physics and physical chemistry
- Shufang Li + 3 more
Determining the luminescent colors of phosphors used in display and lighting applications is a crucial step in discovering new functional luminescent materials. This study collects the experimental conditions for Ca11(SiO4)4(BO3)2 (CSB) phosphor to produce different colors. Through 12 commonly used machine learning models and stacking ensemble model, the luminescence colors of CSB with different ion doping are accurately predicted, and the reliability of the results through experiments is verified. The findings demonstrate that the stacking ensemble model can effectively improve forecasting performance compared to a single optimal model, with overall accuracy, precision, recall score, and f1 score of 98.19%, 98.27%, 98.10%, and 98.18%, respectively. It is the best stacking ensemble model currently known. Compared with the single best classification model, the stacking ensemble model achieves relative improvements of ≈3.55%, 2.99%, 3.46%, and 3.89%, respectively. In addition, the Commission Internationale de L'Eclairage (CIE)-chromaticity diagram of the luminescence color of phosphors is successfully predicted by using a clustering method applied to the output of the stacking model; and experiments further verify the generalization performance of the model. The research results reveal that the stacking ensemble model has high precision and speed in predicting phosphor luminescence colors, and has great potential in optimizing luminescence properties.
- New
- Research Article
- 10.1186/s13018-025-06386-8
- Nov 7, 2025
- Journal of orthopaedic surgery and research
- Ali Yalcinkaya + 5 more
Core Outcome Sets (COS) are essential for standardizing outcome reporting in clinical research, yet their development remains resource-intensive and time-consuming. Traditional COS development requires months of expert work for manual outcome extraction and classification from literature. While machine learning (ML) has shown promise in automating systematic reviews, its application to COS development, particularly for outcome identification and classification, remains underexplored. This study evaluates whether ML models can accurately extract and classify verbatim outcomes from clinical studies according to the COMET taxonomy and determines the amount of manually annotated data needed to support reliable model performance. We developed an ML pipeline using a dataset of 149 full-text studies on lower limb lengthening surgery. The pipeline comprised a Sentence-BERT-based extraction model for identifying verbatim outcomes and a classification model for assigning outcomes to COMET taxonomy domains. We systematically assessed performance using training sets ranging from 5 to 85 articles to establish a practical threshold for reliable model behavior. Model performance was validated using a 28-article hold-out set with standard metrics: precision, recall, and F1-score. A training size of 20 articles proved sufficient for stable model performance. The extraction model achieved an F1-score of 94% with precision and recall above 90%. The classification model attained a weighted-average F1-score of 86%, with 87% precision and 88% recall. When applied to the full dataset, the system successfully identified 94% of manually extracted outcomes. The distribution of outcome domains identified by ML closely mirrored manual classification with high accuracy. This study demonstrates the feasibility of applying ML-based outcome extraction and classification within a specific COS development context for lower limb lengthening surgery. By reducing annotation requirements from 149 to just 20 articles while maintaining high accuracy, our approach offers a scalable, reproducible solution that substantially reduces the manual workload in COS development. This pipeline can play a significant role in streamlining evidence synthesis processes, potentially accelerating the generation of outcome lists for consensus-building exercises in COS development.
- New
- Research Article
- 10.1097/md.0000000000045611
- Nov 7, 2025
- Medicine
- Xiaolin Xu + 5 more
Acute myocardial infarction (AMI) remains a major cause of cardiovascular-related disability and mortality globally. Previous studies have indicated that there is a close interaction between immune responses and mitochondrial metabolism, which may affect the occurrence and development of AMI. Exploring these interactions is crucial for discovering new biomarkers and therapeutic targets. We retrieved gene expression data from Gene Expression Omnibus, employing differential expression analysis, enrichment analysis, weighted gene co-expression network analysis, and machine learning to identify mitochondria-related hub genes in AMI. The nomogram model was developed for diagnosis. Cell-type Identification by Estimating Relative Subsets of RNA Transcripts and Pearson correlation analyses were conducted to explore the relationship between these hub genes and immune cells. Gene set enrichment analysis was conducted to explore mitochondrial metabolism pathway enrichment in immune cells using single-cell sequencing data. Drug predictions were made using the EnrichR platform. Real-time quantitative polymerase chain reaction validated the expression levels of the identified hub genes. Five mitochondria-related hub genes with diagnostic potential for AMI were identified. Both classification and nomogram models exhibited good diagnostic performance. Subsequent validation via real‑time quantitative polymerase chain reaction confirmed significant upregulation of ACSL1, ALDH2, C15orf48, SLC25A37, and CYP27A1 in AMI (P < .05). Significant differences in 13 types of immune cells were observed between AMI and controls, with the 5 hub genes significantly associated with various immune cells. Most of the mitochondrial metabolism-related pathways were significantly upregulated in T cells, B cells, and dendritic cells. This exploratory study provides preliminary insight into the interplay between mitochondrial metabolism and immunity in AMI and highlights a set of potential candidate biomarkers that may support AMI diagnosis. However, rigorous external validation is essential before any clinical application can be considered.
- New
- Research Article
- 10.1038/s41467-025-64812-1
- Nov 7, 2025
- Nature communications
- Yueming Yin + 9 more
Single-molecule localization microscopy enables high-resolution imaging of molecular interactions, but discriminating molecular binding types has traditionally relied on complex strategies, such as multiple dyes, time-division techniques, or kinetic analysis, that are asynchronous, invasive, or time-consuming. Here, we uncover previously overlooked spatiotemporal information embedded within diffraction-limited fluorescence, enabling synchronous classification of individual binding event videos using only a single fluorescent dye. Building on this insight, we propose a Temporal-to-Context Convolutional Neural Network (T2C CNN), which integrates long-term spatial convolutions, shallow cross-connected blocks, and a pooling-free structure to enhance contextual representation while preserving fine-grained temporal features. Applied to DNA-PAINT experiments, T2C CNN achieves up to 94.76% classification accuracy and outperforms state-of-the-art video classification models by 15-25 percentage points. Our approach enables rapid and precise binding-type recognition from fluorescence video data, reducing observation time from minutes to seconds and facilitating high-throughput single-molecule imaging without requiring multiple dye channels or extended kinetic measurements.
- New
- Research Article
- 10.1109/tbme.2025.3628167
- Nov 7, 2025
- IEEE transactions on bio-medical engineering
- Xingwei An + 4 more
With the advancement of neuroscience and computer science, electroencephalography (EEG) has drawn increasing attention as a promising modality for biometric identification, owing to its universality, permanence, and security. However, existing studies have pointed out that maintaining stable and temporally robust inter-individual features remains a major challenge in EEG-based identification. Therefore, developing effective methods for cross-time EEG-based identity recognition is essential for achieving reliable and practical biometric systems. In this study, we propose a novel EEG-based identification framework grounded in symmetric positive definite (SPD) manifolds. Specifically, we utilize the spatial covariance matrices of EEG signals to represent individual differences and introduce an enhanced feature extraction method (E-SPD-M) that simultaneously captures temporal, spatial, and spectral characteristics. These matrices are embedded into the Riemannian manifold to construct a discriminative representation space. For each subject, we build a personalized classification model and integrate their outputs to achieve accurate identification. Furthermore, we construct a comprehensive multi-task, cross-time EEG dataset and validate our approach on both our dataset and a publicly available longitudinal EEG dataset (M3CV). Experimental results demonstrate that our method achieves superior cross-time identification performance. Overall, this work offers a novel pathway for improving EEG-based biometric algorithms and extending the application of Riemannian geometry in the field.
- New
- Research Article
- 10.3390/diagnostics15212808
- Nov 6, 2025
- Diagnostics
- Joshua Mijares + 3 more
Background: Artificial intelligence (AI) has shown significant promise in augmenting diagnostic capabilities across medical specialties. Recent advancements in generative AI allow for synthesis and interpretation of complex clinical data including imaging and patient history to assess disease risk. Objective: To evaluate the diagnostic performance of a dermatology-trained multimodal large language model (DermFlow, Delaware, USA) in assessing malignancy risk of pigmented skin lesions. Methods: This retrospective study utilized data from 59 patients with 68 biopsy-proven pigmented skin lesions seen at Indiana University clinics from February 2023 to May 2025. De-identified patient histories and clinical images were input into DermFlow, and clinical images only were input into Claude Sonnet 4 (Claude) to generate differential diagnoses. Clinician pre-operative diagnoses were extracted from the clinical note. Assessments were compared to histopathologic diagnoses (gold standard). Results: Among 68 clinically concerning pigmented lesions, DermFlow achieved 47.1% top diagnosis accuracy and 92.6% any-diagnosis accuracy, with F1 = 0.948, sensitivity 93.9%, and specificity 89.5% (balanced accuracy 91.7%). Claude had 8.8% top diagnosis and 73.5% any-diagnosis accuracy, F1 = 0.816, sensitivity 81.6%, specificity 52.6% (balanced accuracy 67.1%). Clinicians achieved 38.2% top diagnosis and 72.1% any-diagnosis accuracy, F1 = 0.776, sensitivity 67.3%, specificity 84.2% (balanced accuracy 75.8%). DermFlow recommended biopsy in 95.6% of cases vs. 82.4% for Claude, with multiple pairwise differences favoring DermFlow (p < 0.05). Conclusions: DermFlow demonstrated comparable or superior diagnostic performance to clinicians and superior performance to Claude in evaluating pigmented skin lesions. Although additional data must be gathered to further validate the model in real clinical settings, these initial findings suggest potential utility for dermatology-trained AI models in clinical practice, particularly in settings with limited dermatologist availability.
- New
- Research Article
- 10.1088/1361-6501/ae1c5c
- Nov 6, 2025
- Measurement Science and Technology
- Chenhui Qian + 5 more
Abstract As an effective transfer method, during domain adaptation, maximum classifier discrepancy (MCD) uses classifier discrepancy to guide decision boundaries for optimization. However, the guidance based on classifier discrepancy narrows the decision boundary between the two classifiers, which weakens the discrimination of the classifiers. Thus, MCD leads to a decrease in the classification accuracy of the model. To preserve the decision-making capacity of the classifiers, an uneven maximum classifier discrepancy (GCN-KAN-UNCD) combining graph convolutional network (GCN) and Kolmogorov-Arnold networks (KAN) is proposed for cross-domain fault diagnosis of bearings. Firstly, the GCN-KAN model is constructed by combining GCN and KAN to improve the model's feature extraction and classification ability. Second, the discrepancy of the two classifiers is replaced with a metric function to guide the model for domain adaptation. Thus, the fault classification ability of a classifier is preserved. Finally, six transfer tasks on the Paderborn datasets show that the diagnostic accuracy of GCN-KAN-UNCD exceeds the average SSAGCN accuracy of 15.98% and outperforms existing cross-domain diagnostic methods.
- New
- Research Article
- 10.3390/biomechanics5040094
- Nov 6, 2025
- Biomechanics
- Xishi Zhu + 4 more
Background/Objectives: Outcomes following Anterior Cruciate Ligament (ACL) reconstruction vary widely among patients, yet existing classification techniques often lack transparency and clinical interpretability. To address this gap, we developed a multi-modal framework that integrates gait dynamics with patient-specific characteristics to enhance personalized assessment of ACL reconstruction outcomes. Methods: Participants, both post-ACL reconstruction and healthy controls, were equipped with inertial measurement unit (IMU) sensors on bilateral wrists, ankles, and the sacrum during standardized locomotion tasks. Using the Phase Slope Index (PSI), we quantified causal relationships between sensor pairs, hypothesizing that (1) PSI-derived metrics capture discriminative biomechanical interactions; (2) task-specific differences in segment coordination patterns influence model performance; and (3) recovery duration modulates classifier confidence and the structure of high-dimensional data distributions. Classification models were trained using PSI features, and permutation-based sensor importance analyses were conducted to interpret task-specific biomechanical contributions. Results: PSI-based classifiers achieved 96.37% accuracy in distinguishing ACL reconstruction outcomes, validating the first hypothesis. Permutation importance revealed that jogging tasks produced more focused importance distributions across fewer sensor pairs while improving accuracy, confirming task-specific coordination effects (hypothesis two). Visualization via t-SNE demonstrated that longer recovery durations corresponded to reduced model confidence but more coherent feature clusters, supporting the third hypothesis. Conclusions: By integrating causal gait metrics and patient recovery profiles, this approach enables interpretable and high-performing ACL outcome prediction. Quantitative evaluation measures—including model confidence and t-SNE cluster coherence—offer clinicians objective tools for personalized rehabilitation monitoring and data-driven return-to-sport decisions.
- New
- Research Article
- 10.3389/frai.2025.1665798
- Nov 6, 2025
- Frontiers in Artificial Intelligence
- Hao Chen + 9 more
The rapid growth of social media has resulted in an explosion of online news content, leading to a significant increase in the spread of misleading or false information. While machine learning techniques have been widely applied to detect fake news, the scarcity of labeled datasets remains a critical challenge. Misinformation frequently appears as paired text and images, where a news article or headline is accompanied by a related visuals. In this paper, we introduce a self-learning multimodal model for fake news classification. The model leverages contrastive learning, a robust method for feature extraction that operates without requiring labeled data, and integrates the strengths of Large Language Models (LLMs) to jointly analyze both text and image features. LLMs are excel at this task due to their ability to process diverse linguistic data drawn from extensive training corpora. Our experimental results on a public dataset demonstrate that the proposed model outperforms several state-of-the-art classification approaches, achieving over 85% accuracy, precision, recall, and F1-score. These findings highlight the model's effectiveness in tackling the challenges of multimodal fake news detection.
- New
- Research Article
- 10.1111/1556-4029.70189
- Nov 6, 2025
- Journal of forensic sciences
- Pratibha Amol Tambewagh + 1 more
Malware detection and classification in network traffic is a critical challenge in cybersecurity, with evolving threats that traditional methods struggle to address. As network traffic becomes more complex, accurately identifying malicious activities while minimizing false positives is essential for real-time monitoring systems. This study aims to enhance malware detection using deep learning (DL) techniques, focusing on improving accuracy, reducing false positives, and enabling real-time detection in dynamic network environments. Several advanced DL techniques are introduced to address these challenges. Entropy-Based Traffic Filtering (ETF) measures the randomness in network traffic to identify anomalies and malicious patterns, reducing noise and improving feature extraction. Self-Supervised Learning for Anomaly Detection (SSLAD) detects malware without labeled data by learning normal traffic patterns and identifying anomalies, thus improving the detection of unknown threats. Graph Neural Networks for Malware Traffic Classification (GNN-MTC) model network traffic as graphs, where devices are nodes, and communications are edges, capturing relational dependencies and anomalies to detect complex attack patterns like botnets and command-and-control (C2) communications. Context-Aware Graph Attention Networks (CA-GAT) further enhance detection by analyzing traffic as graphs while incorporating contextual factors like time and behavior, focusing on relevant interactions to improve attack detection. The proposed DL model achieves 98% accuracy, surpassing DeepMAL (95%) and an entropy-based method by Huang etal. (97.3%). Its strong precision and recall demonstrate superior performance in detecting known and novel malware, making it well-suited for real-time network security applications. The model was implemented using Python. Future research could focus on integrating real-time adaptive learning models, exploring hybrid DL architectures, and enhancing cross-platform malware detection, ensuring scalability and robustness in evolving network security environments.
- New
- Research Article
- 10.1080/10589759.2025.2584627
- Nov 6, 2025
- Nondestructive Testing and Evaluation
- Peijian Jin + 3 more
ABSTRACT Low-temperature charging induces lithium plating and stress accumulation, posing severe safety challenges for lithium-ion batteries (LIBs). However, the mechanisms underlying their internal damage evolution and failure mode transitions remain unclear. This study employs acoustic emission (AE) technology to develop an innovative hybrid classification model integrating K-means clustering, linear classification, and Gaussian kernel support vector machines. This approach enables adaptive recognition and dynamic tracking of multiple damage modes in LIBs. Results indicate that AE signals during charging exhibit distinctive dual-burst waveform characteristics. Pearson correlation analysis reveals that two burst signals share similar waveform features, originating from correlated waveforms of the same damage event. Classification results reveal that damage modes evolve from tensile-dominated patterns in the early charging stage to shear and mixed modes in the later stage. Furthermore, the coupled effects of low temperature and high charging rates significantly accelerate the accumulation of shear and mixed damage. Furthermore, continuous wavelet transform (CWT) analysis revealed a time-frequency evolution pattern where AE signals transitioned from high-frequency short-duration to low-frequency long-duration signals, aligning with the transformation of damage modes. This study established a multiscale acoustic emission analysis framework integrating hybrid learning classification and time-frequency analysis, providing novel insights and technical support for elucidating low-temperature failure mechanisms and enabling early warning in lithium-ion batteries.
- New
- Research Article
- 10.3390/math13213569
- Nov 6, 2025
- Mathematics
- Qiang Han + 3 more
The pervasive spread of Android malware poses significant threats to users and systems worldwide. In most existing studies, differences in feature importance are often overlooked, and the calculation of feature weights is conducted independently of the classification model. In this paper, we propose an Android malware detection method, Leveraging Extraction Method and Soft Voting classification (LEMSOFT). This approach includes a novel preprocessing module, lexical occurrence ratio-based filtering (LORF), and an improved Soft Voting mechanism optimized through genetic algorithms. We introduce LORF to evaluate and enhance the significance of permissions, API calls, and opcodes. Each type of feature is then independently classified using tailored machine learning models. To integrate the outputs of these classifiers, this paper proposes an innovative soft voting mechanism that improves prediction accuracy for encountered applications by assigning weights through a genetic algorithm. Our solution outperforms the baseline methods we studied, as evidenced by the evaluation of 5560 malicious and 8340 benign applications, with an average accuracy of 99.89%. The efficacy of our methodology is demonstrated through extensive experiments, showcasing significant improvements in detection rates compared to state-of-the-art (SOTA) methods.
- New
- Research Article
- 10.4108/eetinis.124.10405
- Nov 6, 2025
- EAI Endorsed Transactions on Industrial Networks and Intelligent Systems
- Tuyet-Nhi Thi Nguyen + 4 more
The primary objective of deep learning is to have good performance on a large dataset. However, when the model lacks sufficient data, it becomes a challenge to achieve high accuracy in predicting these unfamiliar classes. In fact, the real-world dataset often introduces new classes, and some types of data are difficult to collect or simulate, such as medical images. A subset of machine learning is meta learning, or "learning-to-learn", which can tackle these problems. In this paper, a few-shot classification model is proposed to classify three types of brain cancer: Glioma brain cancer, Meningioma brain cancer, and brain Tumor cancer. To achieve this, we employ an episodic meta-training paradigm that integrates the model-agnostic meta-learning (MAML) framework with a prototypical network (ProtoNet) to train the model. In detail, ProtoNet focuses on learning a metric space by computing distances to class prototypes of each class, while MAML concentrates on finding the optimal initialization parameters for the model to enable the model to learn quickly on a few labeled samples. In addition, we compute and report the average accuracy for the baseline and our methods to assess the quality of the prediction confidence. Simulation results indicate that our proposed approach substantially surpasses the performance of the baseline ResNet18 model, achieving an average accuracy improvement from 46.33% to 92.08% across different few-shot settings. These findings highlight the potential of combining metric-based and optimization-based meta-learning techniques to improve diagnostic support in healthcare applications.
- New
- Research Article
- 10.3390/inventions10060101
- Nov 6, 2025
- Inventions
- Pornthep Phanbua + 3 more
Research shows that individuals with heart failure are 60% more likely to develop dementia because of their shared metabolic risk factors. Developing a classification model to differentiate between these two conditions effectively is crucial for improving diagnostic accuracy, guiding clinical decision-making, and supporting timely interventions in older adults. This study proposes a novel method for dementia classification, distinguishing it from its common comorbidity, heart failure, using blood testing and personal data. A dataset comprising 11,124 imbalanced electronic health records of older adults from hospitals in Chiang Rai, Thailand, was utilized. Conditional tabular generative adversarial networks (CTGANs) were employed to generate synthetic data while preserving key statistical relationships, diversity, and distributions of the original dataset. Two groups of ensemble models were analyzed: the boosting group—extreme gradient boosting, light gradient boosting machine—and the bagging group—random forest and extra trees. Performance metrics, including accuracy, precision, recall, F1-score, and area under the receiver-operating characteristic curve were evaluated. Compared with the synthetic minority oversampling technique, CTGAN-based synthetic data generation significantly enhanced the performance of ensemble learning models in classifying dementia and heart failure.
- New
- Research Article
- 10.1186/s40468-025-00409-1
- Nov 6, 2025
- Language Testing in Asia
- Apichat Khamboonruang
Investigating the applicability of a diagnostic classification model in a Thai EFL classroom writing assessment: a GDINA model study
- New
- Research Article
- 10.1007/s43538-025-00593-x
- Nov 6, 2025
- Proceedings of the Indian National Science Academy
- Ankur Tomar + 2 more
Comparative analysis of AlexNet and ResNet50-based classification models for crop wild relatives identification
- New
- Research Article
- 10.1088/1681-7575/ae1bae
- Nov 5, 2025
- Metrologia
- Samuel Bilson + 3 more
Abstract Machine learning (ML) classification models are increasingly being used in a wide range of applications where it is important that predictions are accompanied by uncertainties, including in climate and earth observation, medical diagnosis and bioaerosol monitoring. The output of an ML classification model is a type of categorical variable known as a nominal property in the International Vocabulary of Metrology (VIM). However, concepts related to uncertainty evaluation for nominal properties are not defined in the VIM, nor is such evaluation addressed by the Guide to the Expression of Uncertainty in Measurement (GUM). In this paper we propose a metrological conceptual uncertainty evaluation framework for nominal properties. This framework is based on probability mass functions and summary statistics thereof, and it is applicable to ML classification. We also illustrate its use in the context of two applications that exemplify the issues and have significant societal impact, namely, climate and earth observation and medical diagnosis. Our framework would enable an extension of the GUM to uncertainty for nominal properties, which would make both applicable to ML classification models.
- New
- Research Article
- 10.1007/s00330-025-12097-9
- Nov 5, 2025
- European radiology
- Chaowei Ma + 7 more
This study presents a novel deep learning-machine learning fusion network for quantitative and interpretable assessment of chest X-ray positioning, aiming to analyze critical factors in patient positioning layout. In this retrospective study, we analyzed 3300 chest radiographs from a Chinese medical institution, collected between March 2021-December 2022. The dataset was partitioned into the XJ_chest_21 subset for training automated segmentation model and the XJ_chest_22 subset to validate three classification models: Random Forest Fusion Network (RFFN), Threshold Classification (TC), and Multivariate Logistic Regression (MLR). After automatically measuring five positioning indicators in the images, the data were input into the models to assess positioning quality. We compared the performance metrics of the three classification models, including AUC, accuracy, sensitivity, and specificity. SHAP (Shapley Additive Explanations) was utilized to interpret feature importance in the decision-making process of the RFFN model. We evaluated measurement consistency between the Automated Measurement Model (AMM) and radiologists. U-net++ demonstrated significantly superior performance compared to U-net in multi-target segmentation accuracy (mean Dice: 0.926 vs. 0.812). The five positioning metrics showed excellent agreement between AMM and reference standards (r = 0.93). ROC analysis indicated that RFFN performed significantly better in overall image quality classification (AUC, 0.982; 95% CI: 0.963, 0.993) compared to both TC (AUC, 0.959; 95% CI: 0.923, 0.995) and MLR (AUC, 0.953; 95% CI: 0.933, 0.974). Our study introduces a novel segmentation-based random forest fusion network that achieves accurate image positioning classification and identifies critical operational factors. Furthermore, the clinical interpretability of the fusion model was enhanced through the application of the SHAP method. Question How can AI-driven interpretable methods be utilized to assess patient positioning in chest radiography and enhance radiographers' accuracy? Findings The Random Forest Fusion Network (RFFN) outperformed Threshold Classification (TC) and Multivariate Logistic Regression (MLR) in positioning classification (AUC = 0.98). Clinical relevance An integrated framework that combines deep learning and machine learning achieves accurate image positioning classification, identifies critical operational factors, enables expert-level image quality assessment, and delivers automated feedback to radiographers.
- New
- Research Article
- 10.1021/acschemneuro.5c00649
- Nov 5, 2025
- ACS chemical neuroscience
- Avantika Bansal + 4 more
Alzheimer's disease (AD) is a progressive neurodegenerative disorder in which amyloid-β (Aβ) aggregation plays a pivotal role in its onset and progression. Inhibiting Aβ aggregation is a promising therapeutic strategy; however, its intrinsically disordered and conformationally flexible nature hinders both conventional and computational inhibitor design. Moreover, experimental development of Aβ inhibitors, encompassing molecular design, synthesis, and biological evaluation through repeated assays, is a slow, labor-intensive, and resource-intensive process. Therefore, robust design guidelines and predictive tools are essential for accelerating the discovery of Aβ inhibitors. To overcome these limitations, we developed a machine-learning-based, user-friendly web platform, Amylo-IC50Pred (https://amyloic50pred.vercel.app/), for rapid virtual screening of small molecules targeting Aβ aggregation. The platform integrates two classification models and one regression model, trained on 584 biologically validated compounds. For inhibitor-decoy discrimination, the Random Forest algorithm achieved perfect accuracy (100%). Potency classification into potent, moderately potent, and poor inhibitors was best achieved using Histogram-based Gradient Boosting (81% accuracy). The IC50 regression model, also based on Random Forest, achieved a coefficient of determination (R2) of 0.93, demonstrating strong predictive performance. 2D and 3D key molecular properties such as hydrophobicity, shape and charge distribution, and molecular symmetry were identified as critical contributors to model performance. Importantly, these identified properties provide valuable insights into the molecular features that govern Aβ aggregation inhibition and can serve as a foundation for rational design of potent and selective Aβ aggregation inhibitors. Amylo-IC50Pred thus represents a valuable resource for accelerating AD drug discovery.
- New
- Research Article
- 10.2214/ajr.25.33352
- Nov 5, 2025
- AJR. American journal of roentgenology
- Yong Li + 7 more
BACKGROUND. Habitat imaging provides a novel approach to capture spatial heterogeneity within lesions. OBJECTIVE. The purpose of this study was to develop a ternary-classification habitat model to characterize lung adenocarcinoma presenting as a subsolid nodule (SSN) on CT and to test the model's diagnostic performance compared with 2D and radiomic models. METHODS. This retrospective study included 747 patients (median age, 56 years; 241 men, 506 women) with 834 resected lung adenocarcinomas that presented as SSNs on low-dose CT between July 2018 and July 2023. Adenocarcinomas from one center were divided into training (n = 440) and internal test (n = 189) sets; adenocarcinomas from three other centers formed an external test set (n = 205). Adenocarcinomas were classified as noninvasive adenocarcinoma, grade 1 invasive adenocarcinoma (IAC), or grade 2 or 3 (hereafter, grade 2/3) IAC. Ternary-classification models were built in the training set using multivariable multinomial logistic regression analyses (2D model: diameter and consolidation-to-tumor ratio; habitat model: volume and volume ratio of attenuation-based subregions; radiomic model: extracted radiomic features; combined model: habitat and radiomic features). Performance was evaluated using macroaveraged and class-specific AUCs. RESULTS. The optimal number of habitats was four. The 2D, habitat, radiomic, and combined models had macroaveraged AUCs in the internal test set of 0.857, 0.909, 0.914, and 0.912 and in the external test set of 0.871, 0.919, 0.924, and 0.926, respectively. Those four models had class-specific AUCs in the external test set for noninvasive adenocarcinoma of 0.945, 0.956, 0.961, and 0.955; for grade 1 IAC of 0.792, 0.858, 0.857, and 0.862; and for grade 2/3 IAC of 0.875, 0.940, 0.952, and 0.961, respectively. In the external test set, macroaveraged AUCs and class-specific AUCs for grades 1 and 2/3 IAC were significantly higher for habitat, radiomic, and combined models versus the 2D model, but not for other model comparisons; class-specific AUCs for noninvasive adenocarcinoma were not significantly different for any model comparisons. CONCLUSION. The habitat model performed significantly better than the 2D model in ternary adenocarcinoma classification; its performance was not significantly different from the radiomic and combined models. CLINICAL IMPACT. The habitat model's combination of interpretability and diagnostic performance supports its utility for noninvasive risk stratification of SSNs encountered during lung cancer screening.