• All Solutions All Solutions Caret
    • Editage

      One platform for all researcher needs

    • Paperpal

      AI-powered academic writing assistant

    • R Discovery

      Your #1 AI companion for literature search

    • Mind the Graph

      AI tool for graphics, illustrations, and artwork

    • Journal finder

      AI-powered journal recommender

    Unlock unlimited use of all AI tools with the Editage Plus membership.

    Explore Editage Plus
  • Support All Solutions Support
    discovery@researcher.life
Discovery Logo
Sign In
Paper
Search Paper
Cancel
Pricing Sign In
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Chat PDF iconChat PDF Star Left icon
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link
Discovery Logo menuClose menu
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Chat PDF iconChat PDF Star Left icon
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link

Related Topics

  • Class Imbalance Problem
  • Class Imbalance Problem
  • Imbalanced Data Classification
  • Imbalanced Data Classification
  • Imbalanced Data
  • Imbalanced Data
  • Imbalance Problem
  • Imbalance Problem
  • Imbalanced Datasets
  • Imbalanced Datasets
  • Imbalanced Learning
  • Imbalanced Learning

Articles published on Class imbalance

Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
8970 Search results
Sort by
Recency
  • New
  • Research Article
  • 10.70425/rml.202504.30
Intelligent classification with class imbalance for surrounding rock squeezing in under-ground engineering
  • Dec 9, 2025
  • Rock Mechanics Letters
  • Feng Gao + 3 more

Surrounding rock squeezing, as a typical geological disaster, occurs frequently during underground excava-tion. To accurately and quickly predict squeezing risk, machine learning is introduced, which has strong abilities to deal with the nonlinear relationship between surrounding rock squeezing and multiple disaster-triggering factors. How-ever, the prediction performance of machine learning is greatly affected by the quality of the data. Consequently, this study takes into account class imbalance and proposes a novel intelligent paradigm for squeezing prediction. First, at the data level, synthetic minority oversampling technique (SMOTE) is used to create a more balanced data environment. And then, to increase the utilization rate of the minority, ensemble learning (EL) is used to mine data value and build prediction models, including extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), ran-dom forest (RF), and extremely randomized tree (ERT). And at the same time, whale optimization algorithm (WOA) is integrated to optimize the hyperparameters of SMOTE and EL. The results reveal that the model of SMOTE-ERT performs best. Its accuracy and macro F1-score reach 93.94% and 0.9323 respectively. Additionally, through the com-parative analysis, the necessity of SMOTE is affirmed. Finally, taking the optimal model as an example, the contribu-tion of each input parameter is quantitatively evaluated, and strength-stress ratio plays the most important role in squeezing prediction.

  • New
  • Research Article
  • 10.1371/journal.pone.0335141
Flight delay prediction: Evaluating machine learning algorithms for enhanced accuracy
  • Dec 8, 2025
  • PLOS One
  • Sarah Ahmed A Albassam + 1 more

Flight delays pose substantial operational and economic challenges for airlines, directly affecting scheduling efficiency, resource allocation, and passenger satisfaction. Accurate prediction of arrival delays is therefore critical for optimizing airline operations and enhancing customer experience. This study systematically evaluates the predictive performance of six machine learning classifiers—Decision Tree, Random Forest, Support Vector Classifier (SVC), Logistic Regression, K-Nearest Neighbors (KNN), and Naive Bayes—on a comprehensive flight dataset, with particular attention to the challenges posed by class imbalance. To mitigate skewed class distributions, resampling techniques including Random Oversampling, Synthetic Minority Oversampling Technique (SMOTE), and Adaptive Synthetic Sampling (ADASYN) were applied to the training data. Model performance was rigorously assessed using stratified 10-fold cross-validation and further validated on a hold-out test set, employing multiple evaluation metrics: Accuracy, F1-score, Matthews Correlation Coefficient (MCC), and ROC-AUC. The results demonstrate that Random Forest combined with Random Oversampling and Decision Tree combined with SMOTE both achieved the highest predictive performance (accuracy 0.90, F1-score 0.90, MCC 0.73, ROC-AUC 0.87. Notably, simpler models such as Naive Bayes exhibited competitive results under balanced conditions, underscoring the continued relevance of probabilistic classifiers in certain operational contexts. These findings highlight the critical role of resampling strategies and rigorous cross-validation in developing reliable, high-performing predictive models for imbalanced flight delay datasets, offering actionable insights for both airline operations and data-driven decision-making.

  • New
  • Research Article
  • 10.3389/fdgth.2025.1694486
Design and development of an mHealth application for pressure ulcer care and caregiver support
  • Dec 8, 2025
  • Frontiers in Digital Health
  • Shreenidhi Jogi + 6 more

Introduction Smartphone accessibility has enabled the widespread use of mobile health applications for managing health conditions. While mobile technology is increasingly adopted globally, integrated digital solutions specifically supporting home-based pressure ulcer care remain limited. This study aimed to design and develop a mobile health (mHealth) application named IPI (Interprofessional Pressure Injury) application that integrates artificial intelligence-based pressure ulcer staging, caregiver-focused education, personalized nutritional support, and visual wound monitoring to assist caregivers and healthcare professionals in delivering timely and effective care. Methods A comprehensive deep learning framework was developed using a clinically validated dataset of pressure ulcer images spanning six categories, including healthy tissue and Stage 1–4 ulcers. To address class imbalance and subtle inter-class variability, a class-adaptive augmentation pipeline and an enhanced Vision Transformer architecture with hierarchical feature representation and specialized self-attention were implemented. Training employed a stratified 5-fold cross-validation, class-balanced focal loss, regularization techniques, and a two-tiered ensemble strategy. Results The proposed k-fold ensemble model achieved an accuracy of 0.9705 and macro F1 score of 0.9695, with perfect classification of Stage 4 ulcers and substantial improvements for underrepresented classes. Discussion These results demonstrate the model's effectiveness for pressure ulcer classification, offering a robust foundation for real-time clinical decision support. The application supports remote monitoring, healing status detection, and educational access, especially in resource-limited settings. This holistic solution not only enhances caregiver confidence and independence but also aids clinicians in wound assessment and intervention planning. A future experimental study will validate the app's clinical utility, impact on patient outcomes, and potential to improve the quality of home-based pressure ulcer management.

  • New
  • Research Article
  • 10.3390/s25247437
TGDNet: A Multi-Scale Feature Fusion Defect Detection Method for Transparent Industrial Headlight Glass
  • Dec 6, 2025
  • Sensors
  • Zefan Zhang + 1 more

In industrial production, defect detection for automotive headlight lenses is an essential yet challenging task. Transparent glass defect detection faces several difficulties, including a wide variety of defect shapes and sizes, as well as the challenge of identifying transparent surface defects. To enhance the accuracy and efficiency of this process, we propose a computer vision-based inspection solution utilizing multi-angle lighting. For this task, we collected 2000 automotive headlight images to systematically categorize defects in transparent glass, with the primary defect types being spots, scratches, and abrasions. During data acquisition, we proposed a dataset augmentation method named SWAM to address class imbalance, ultimately generating the Lens Defect Dataset (LDD), which comprises 5532 images across these three main defect categories. Furthermore, we propose a defect detection network named the Transparent Glass Defect Network (TGDNet), designed based on common transparent glass defect types. Within the backbone of TGDNet, we introduced the TGFE module to adaptively extract local features for different defect categories and employed TGD, an improved SK attention mechanism, combined with a spatial attention mechanism to boost the network’s capability in multi-scale feature fusion. Experiments demonstrate that compared to other classical defect detection methods, TGDNet achieves superior performance on the LDD, improving the average detection precision by 6.7% in mAP and 8.9% in mAP50 over the highest-performing baseline algorithm.

  • New
  • Research Article
  • 10.1186/s40708-025-00279-6
Boosting brain tumor detection with an optimized ResNet and explainability via Grad-CAM and LIME
  • Dec 5, 2025
  • Brain Informatics
  • K Afnaan + 4 more

Detecting Brain Tumors is essential in medical imaging, as early and accurate diagnosis significantly improves treatment decisions and patient outcomes. Convolutional Neural Networks have demonstrated high efficiency in this domain, but their lack of interpretability remains a significant drawback for clinical adoption. This study explores the integration of Explainability techniques to enhance transparency in CNN-based classification and improve model performance through advanced optimization strategies. The primary research question addressed is how to improve the accuracy, generalization, and interpretability of CNNs for brain tumor Detection. While previous studies have demonstrated the effectiveness of deep learning for tumor detections, challenges such as class imbalance and overfitting of CNNs persist. To bridge this gap, we employ different dynamic learning rate modifiers, perform architectural enhancements, and apply XAI techniques, including Grad-CAM and LIME. Our experiments are conducted on three publicly available multiclass tumor datasets to ensure the generalizability of the proposed approach. Among the tested architectures, the enhanced ResNet model consistently outperformed others across all datasets, achieving the highest test accuracy, ranging from 99.36% to 99.65%. The techniques such as unfreezing layers, integrating various blocks, pooling, and dropout layers enhanced feature refinement and reduced overfitting. By incorporating XAI, we improve model interpretability, ensuring that clinically relevant regions in MRI scans are highlighted. These advancements contribute to highly reliable AI-assisted diagnostics, addressing significant challenges in medical image classification.

  • New
  • Research Article
  • 10.1038/s41598-025-28412-9
Robust evaluation of classical and quantum machine learning under noise, imbalance, feature reduction and explainability.
  • Dec 5, 2025
  • Scientific reports
  • Savita Kumari Sheoran + 2 more

Quantum machine learning (QML) has emerged as a promising paradigm for solving complex classification problems by leveraging the computational advantages of quantum systems. While most traditional machine learning models focus on clean, balanced datasets, real-world data is often noisy, imbalanced and high-dimensional, posing significant challenges for scalability and generalisation. This paper conducts an extensive experimental evaluation of five supervised classifiers- Decision Tree, K nearest neighbour, Random Forest, linear regression and support vector machines in comparison with Quantum machine learning classifiers- quantum Support vector machine, quantum k- nearest neighbor and variational quantum classifier-across five diverse datasets, including iris, wine quality, Breast cancer, UCI human activity recognition, and Pima diabetes. To simulate real-world challenges, we introduce class imbalance using SMOTE and ADASYN Sampling, inject Gaussian noise into the features, and assess the impact of dimensionality reduction through ANOVA-based feature selection. Additionally, we utilise explainable AI tools, such as SHAP and LIME, to interpret model decisions. Our results demonstrate that Logistic Regression consistently performs well under various complexities, while Quantum Support Vector Machines show resilience to feature noise and class imbalance. The study also highlights the current capabilities and limitations of QML models, offering valuable insights into building generalisable and interpretable ML systems for deployment in complex environments. These insights are crucial for building robust, interpretable, and generalisable ML models for practical deployment.

  • New
  • Research Article
  • 10.1371/journal.pone.0330705
ConvLSTM-based tropical cyclone intensity estimation and classification using satellite imagery over the North Indian ocean
  • Dec 5, 2025
  • PLOS One
  • Manju M S + 5 more

Tropical cyclones pose significant threats to coastal regions, and have a major negative influence on the environment and society. Precise cyclone identification and intensity estimation are crucial for effective early warning systems and disaster prevention. Traditional methods rely on manual interpretation and empirical models, often lacking efficiency and accuracy. This study proposes a deep learning framework that utilizes satellite image sequences for cyclone detection, classification, and intensity estimation. Unlike conventional models relying solely on spatial or manual features, the proposed hybrid architecture integrates Convolutional Neural Networks (CNNs) and ConvLSTM to learn spatiotemporal patterns jointly. Key innovations include the clustering-based cyclone region isolation method, sequence-level data augmentation, and the use of SMOTE to mitigate class imbalance. The proposed approach demonstrates substantial improvement in accuracy over baseline models, achieving 99.16% accuracy for binary classification using VGG16. An accuracy of 81.1 ± 4.33% across cyclone intensity levels, and an RMSE of 7.79 ± 1.27 knots in wind speed prediction using the ConvLSTM-based model. All models are evaluated using 5-fold cross-validation on CIMSS Tropical Data Archive and IMD Best-Track datasets. Overall, these results show an exciting potential for future use of deep learning for real time forecasting and early warning systems. Future work will also look to improve or increase model generalization, either through using ensemble learning, or potentially more complex architectures and larger datasets.

  • New
  • Research Article
  • 10.1177/03611981251393652
Cause Classifications and Analysis of Autonomous Vehicle Crash Narratives Using Improved Bidirectional Encoder Representations from Transformers and Latent Dirichlet Allocation Methods
  • Dec 4, 2025
  • Transportation Research Record: Journal of the Transportation Research Board
  • Jing Chen + 5 more

Safety is a critical factor in evaluating autonomous vehicles, and real-world crash data provide valuable insights for assessing autonomous vehicle (AV) safety performance. While structured AV crash data have been widely used to analyze general crash patterns, unstructured crash narratives contain rich contextual information that remains underutilized. These narratives offer in-depth descriptions of crash circumstances, making them essential for understanding AV crash causes. However, extracting meaningful insights from these narratives presents challenges such as data scarcity and class imbalance in cause classification. Therefore, this study utilizes an improved bidirectional encoder representations from transformers (BERT) model to classify sentences related to crash causes and then perform fine-grained cause analysis using topic modeling method latent Dirichlet allocation. Then, text similarity between cause sentences and topic word is computed for topic assignment. To address the problem of data scarcity and class imbalance in cause classification, mixup data augmentation strategy and focal loss are respectively integrated to the BERT model. Experimental results on real California Department of Motor Vehicles crash reports show a significant improvement in cause sentence classification performance compared with baseline methods. Specifically, accuracy, precision, recall, F1-score, and area under curve increased by approximately 4.95%, 8.39%, 20.25%, 14.32%, and 10.16%, respectively. Topics of cause sentences are summarized into three groups, including operational scene, location, and driving status in AV crashes. The results indicate that crashes are most common in operational scenes such as “traffic yielding,”“waiting to turn,” and “pedestrian yielding”. For location-related factors, crashes frequently occur at “intersections” and “stop signs”. Notably, within the driving status category, “manual operation” is the most critical factor.

  • New
  • Research Article
  • 10.59256/indjcst.20250403036
A Multimodal Machine Learning Framework for Class- Imbalanced Cognitive State Classification from High-Density EEG and Physiological Signals
  • Dec 4, 2025
  • Indian Journal of Computer Science and Technology
  • Swapnil Wanjare + 1 more

he physiological data presented by classifying cognitive states in safety critical settings is difficult because of the harsh class imbalance of real-world events. Based on a large-scale multimodal database (N > 21M) of EEG, ECG, respiration and GSR, we designed a baseline machine learning pipeline, which utilized a Light Gradient Boosting Machine (LGBM) classifier. The model scored 93. 02% accuracy on average, although the model showed a critical failure to identify minority classes, and the recall scores were as low as 0.25. This illustrates the fact that standard accuracy is a highly deceiving indicator of this field. It is this benchmark that we utilize in establ ishing a required direction to come up with reliable systems and hence the urgency of the methods that directly tackle the issue of class imbalance in safety-based physiological monitoring.

  • New
  • Research Article
  • 10.1038/s41598-025-27303-3
MedShieldFL-a privacy-preserving hybrid federated learning framework for intelligent healthcare systems
  • Dec 4, 2025
  • Scientific Reports
  • Dileep Kumar Murala + 3 more

Recent advances in artificial intelligence have greatly increased the accuracy of computer-assisted diagnosis for serious conditions including brain tumours. However, concerns about data privacy, class imbalance, and the diversity of medical datasets limit the application of centralised deep learning models in healthcare. This article introduces MedShieldFL, a hybrid privacy-preserving federated learning architecture that enables secure and decentralised brain tumour classification across many medical institutions. The approach uses data augmentation techniques to reduce class imbalance and homomorphic encryption to safely aggregate model changes while safeguarding sensitive patient data. The basic model is a ResNet-18-based classifier that strikes the ideal balance between accuracy and speed. The test results for MedShieldFL show that it can accurately group data into 93% to 96% of the time. This approach improves performance by about 2% compared to traditional federated learning models and keeps data privacy safe enough. The framework makes sure that the extra work that encryption adds to real-world programs stays within acceptable limits. This keeps execution times fair. Medical picture evaluation with MedShieldFL is a useful and flexible technology that protects privacy. This makes it easier for current healthcare systems to use AI that is safe and works with other AI.

  • New
  • Research Article
  • 10.1002/bcp.70377
Machine learning methods for predicting adverse drug events: A systematic review.
  • Dec 4, 2025
  • British journal of clinical pharmacology
  • Niaz Chalabianloo + 8 more

Predicting adverse drug events (ADEs) in outpatient settings is crucial for improving medication safety, identifying high-risk patients and reducing health-care costs. While traditional methods struggle with the complexity of health-care data, machine learning (ML) models offer improved prediction capabilities; however, their effectiveness in ADE prediction remains unclear. This systematic review evaluated ML algorithms used for this purpose, analysing studies that focussed on outpatient care or utilized large-scale data sources (e.g. electronic health records, administrative claims and spontaneous reporting systems) that primarily represent the outpatient continuum. We systematically searched MEDLINE and Embase up to December 2024 to identify studies developing or validating ML models for ADE prediction. Study characteristics, ML methods, ADE types, model performance and risk of bias were assessed using the PROBAST tool. From 59 included studies comprising 191 ML implementations, Logistic regression, Random forest and XGBoost emerged as the most commonly used algorithms. The majority of studies (67.8%) reported area under the curve (AUC), with 85% demonstrating moderate to high performance (AUC > 0.70) for internal validation. However, only 33.9% of studies addressed class imbalance, and merely 18.6% conducted external validation, raising concerns about methodological rigour, particularly in missing data handling and validation procedures. Our findings indicate that ML models, especially ensemble methods, show promise in predicting ADEs, although challenges with class imbalance and limited external validation currently hinder their clinical applicability. Future research should focus on adopting more rigorous methodologies and developing specialized frameworks for ML-based ADE prediction that build upon established pharmacovigilance practices to ensure models are accurate, generalizable, and seamlessly integrated into clinical workflows for ongoing monitoring and improved medication safety.

  • New
  • Research Article
  • 10.56536/jicet.v4i1.223
EARLY-STAGE BRAIN TUMOR SEGMENTATION BY USING DEEP LEARNING APPROACH
  • Dec 4, 2025
  • Journal of Innovative Computing and Emerging Technologies
  • Shafia Kiran

Abstract Early-stage brain tumor segmentation in Magnetic Resonance Imaging (MRI) is critical for timely diagnosis, treatment planning, and improving patient outcomes. Gliomas, the most aggressive primary brain tumors, exhibit significant variations in shape, size, and location, making automated segmentation a challenging task. This paper presents a novel modified 3D U-Net architecture enhanced with Residual Dropout Blocks (RDB) and Attentional Residual Dropout Blocks (ARDB) to address contextual information loss and class imbalance in glioma segmentation. The model leverages residual learning to mitigate gradient degradation in deep networks, while dropout layers reduce overfitting. Additionally, soft attention gates in the decoder path dynamically highlight tumor regions, improving segmentation precision. To handle class imbalance between tumor and non-tumor voxels, a hybrid loss function combining Dice loss and Focal loss is introduced. The model is rigorously evaluated on BraTS2018, BraTS2019, and BraTS2020 datasets, achieving mean Dice scores of 0.9020 (Whole Tumor), 0.9016 (Tumor Core), and 0.9170 (Enhancing Tumor) on BraTS2018; 0.9194, 0.9190, and 0.9304 on BraTS2019; and 0.9275, 0.9279, and 0.9365 on BraTS2020, respectively, outperforming state-of-the-art methods including nnU-Net and context-aware 3D U-Net. Key innovations include the RDB-ARDB block design, which optimizes feature reuse, and attention mechanisms, which reduce false positives in noisy regions. The proposed framework demonstrates robustness to multimodal MRI variability, computational efficiency, and clinical applicability, contributing significantly to AI-driven neuroimaging. Keywords: Brain tumor segmentation, 3D U-Net, Residual Dropout, Attention mechanisms, Deep learning, MRI, BraTS challenge.

  • New
  • Research Article
  • 10.1088/1361-6501/ae1e21
Adaptive attenuation self-attention adversarial network for cross-domain fault diagnosis under imbalanced conditions
  • Dec 4, 2025
  • Measurement Science and Technology
  • Fucan Huang + 6 more

Abstract The acquisition of fault samples is often constrained by factors such as the low occurrence probability of faults and the high costs associated with data collection, leading to an imbalanced distribution of fault types and impairing the ability of the model to accurately recognize minority fault classes. In response to this challenge, this paper proposes an adaptive attenuation self-attention adversarial network (AASAN). In the feature extraction phase, the model incorporates an attention mechanism utilizing a spatial attenuation matrix and bidirectional decomposition, which effectively mitigates the interference of irrelevant distant information while enhancing the ability of the model to capture local feature representations and reducing computational overhead. In the classification phase, an adaptive loss-weighting strategy is introduced, which dynamically adjusts loss weights based on both sample distribution and variations in inter-class accuracy, thereby improving the recognition performance of minority fault classes. For domain adaptation, the model combines conditional domain adversarial network (CDAN), guided by an entropy filtering mechanism, with supervised contrastive learning (SupCon) to align features across domains and mitigate the adverse effects of class imbalance. Experimental results on two bearing datasets demonstrate that the proposed model significantly outperforms comparison methods across multiple evaluation metrics, validating its effectiveness and potential for real-world deployment in complex operating conditions.

  • New
  • Research Article
  • 10.1038/s41598-025-30913-6
An efficient dimensionality reduction framework using metaheuristic optimization with deep learning models for amyotrophic lateral sclerosis disease progression prediction.
  • Dec 4, 2025
  • Scientific reports
  • Mesfer Al Duhayyim

Amyotrophic lateral sclerosis (ALS) is a disastrous neuro-degenerative infection which affects motor neuron inhabitants of the spinal cord, brainstem, and cerebral cortex, resulting in progressive disorder and demise from respiratory difficulty. ALS is considerably assorted disorder comprising symptoms such as muscle weakness, difficulty in swallowing, speaking, breathing, and changes in mental and emotional health. Hence, this disease requires more beneficial medication and also, successful treatment is affected by heterogeneous disease development, resulting in issues with patient stratification. Recently, many researches have been published by using deep learning (DL) and machine learning (ML) methods and, more commonly, artificial intelligence (AI). This paper presents a Dimensionality Reduction Framework Using Metaheuristic Optimization with Deep Learning Models for the Amyotrophic Lateral Sclerosis Disease Progression Prediction (DRMODL-ALSDP) method. The aim is to provide an effectual model for the progression prediction of ALS disease using advanced techniques. Initially, the data pre-processing stage applies min-mx normalization to transform raw data into a suitable format. Furthermore, SMOTE is employed to address class imbalance by upsampling the minority classes in disease progression stages. Furthermore, the binary swordfish movement optimization algorithm (BSMOA) technique is used for feature selection. Moreover, the hybrid of a temporal convolutional network and long short-term memory with attention mechanism (TCN-LSTM-AM) technique is employed for the classification process. Finally, the marine predator's algorithm (MPA) technique optimally fine-tunes the hyperparameter values and improves classification performance. A widespread simulation is performed to verify the performance of the DRMODL-ALSDP model. The comparison study of the DRMODL-ALSDP model accentuated the superior accuracy output of 98.17% over existing methods.

  • New
  • Research Article
  • 10.55041/ijsrem54844
Hybrid XGBoost–Random Forest Ensemble Model for Early Prediction of Type-2 Diabetes Using Multimodal Clinical and Lifestyle Data
  • Dec 4, 2025
  • International Journal of Scientific Research in Engineering and Management
  • R Sumathi + 1 more

Abstract Type-2 Diabetes Mellitus (T2DM) is a rapidly escalating global health challenge, often diagnosed only after clinical symptoms appear, leading to delayed intervention and higher risk of complications. Early prediction using automated computational approaches can significantly improve disease prognosis and reduce healthcare costs. This research proposes a machine learning-driven predictive framework for early detection of Type-2 diabetes by integrating multimodal data, including clinical parameters (such as glucose level, blood pressure, insulin, BMI), demographic attributes, and lifestyle indicators (dietary habits, physical activity, stress level, and sleep patterns). The dataset underwent preprocessing techniques such as normalization, missing value imputation, correlation-based feature selection, and class imbalance handling using SMOTE. Multiple machine learning algorithms—including Logistic Regression, Random Forest, Support Vector Machine, Gradient Boosting, and Extreme Gradient Boosting (XGBoost)—were trained and evaluated to identify the best-performing model. Model performance was assessed using accuracy, precision, recall, F1-score, and ROC-AUC metrics. The XGBoost model achieved superior predictive accuracy and demonstrated strong generalization capability across test samples. Furthermore, explainable AI (XAI) techniques such as SHAP values were employed to interpret feature importance and enhance clinical transparency. Results indicate that lifestyle factors combined with clinical metrics significantly improve predictive performance compared to clinical data alone. The proposed framework shows potential for integration into digital health platforms and preventive screening systems, aiding clinicians in early risk stratification and personalized intervention. Keywords Type-2 Diabetes Prediction, Machine Learning, Multimodal Healthcare Data, XGBoost, Predictive Analytics, Lifestyle Factors, Explainable AI (XAI), Early Diagnosis.

  • New
  • Research Article
  • 10.1038/s41531-025-01218-2
Subitem-level multi-scale assessment and machine learning for three-class cognitive status classification in Parkinson's disease.
  • Dec 4, 2025
  • NPJ Parkinson's disease
  • Ying-Che Chen + 2 more

People with Parkinson's disease (PD) frequently develop cognitive impairments, and early accurate classification of cognitive status is critically important for clinical intervention. In this study, we leveraged data from the Parkinson's Progression Markers Initiative (PPMI) to develop a two-stage machine-learning framework that distinguishes among three cognitive states: PD with normal cognition (PD-NC), PD with mild cognitive impairment (PD-MCI), and PD dementia (PDD). Our approach combined SHapley Additive exPlanations (SHAP) for model interpretability with an ensemble of XGBoost and multilayer perceptron (MLP) classifiers, addressing class imbalance via the SMOTE-Tomek method. All model development and validation were conducted with a strict hold-out evaluation, with the test-set entirely excluded from feature selection, model training, and threshold optimization. Independent validation demonstrated strong and balanced classification performance across all cognitive subgroups, with particularly effective identification of cognitively impaired individuals requiring clinical attention. The area under the receiver operating characteristic curve (AUC) for three-class discrimination exceeded 0.85. Key predictors, including Montreal Cognitive Assessment (MoCA) scores and activities of daily living assessments, were validated as clinically meaningful by SHAP analysis. The proposed two-stage explainable model demonstrates strong and balanced classification performance across cognitive subgroups in PD. Its ability to identify people at high risk for dementia highlights its potential utility in clinical workflows, particularly as a scalable tool for early cognitive stratification and decision support in routine neurology practice. However, external validation on diverse real-world cohorts is warranted before clinical implementation.

  • New
  • Research Article
  • 10.1038/s41598-025-27338-6
Evaluation of various traditional machine learning techniques for predicting the acute effect of different hamstring muscle stretching methods among male soccer players.
  • Dec 4, 2025
  • Scientific reports
  • Elham Hosseini + 6 more

This study investigated the acute effects of static (SS), dynamic (DS), and ballistic (BS) hamstring stretching on performance in male soccer players and applied machine learning (ML) to predict protocol efficacy. A total of 249 players with and without hamstring shortening completed each protocol across three sessions with 72h of rest. Hamstring shortening classified via passive knee extension test (> 32.2° knee angle). Flexibility, strength, sprint, power, and agility were measured pre- and post-stretching. Each protocol: 4 sets × 30s (holds/swings/bounces at 50-60bpm), 10s rest. ML models (k-NN, SVM, random forest) were trained on pre-post difference scores, with feature selection applied to identify key predictors and Synthetic Minority Over-sampling Technique used to address class imbalance. Findings indicate SS optimally acutely improves flexibility, whereas DS offers broader immediate performance benefits for a subsequent activity. Combining feature selection and data balancing increased k-NN accuracy to 53% (only ~ 20% points above the chance level of 33.3% for this three-class problem), highlighting methodological challenges in predicting individual responses. Exploratory analysis using ML using synthetic minority over-sampling technique reached a peak accuracy of 53.06% (compared to a baseline of 33.3%), demonstrating the promise of the approach but also highlighting the challenges of applying ML to predict individual responses to stretching interventions, underscoring the need for larger datasets and more advanced models.

  • New
  • Research Article
  • 10.3390/jimaging11120433
DiagNeXt: A Two-Stage Attention-Guided ConvNeXt Framework for Kidney Pathology Segmentation and Classification
  • Dec 4, 2025
  • Journal of Imaging
  • Hilal Tekin + 2 more

Accurate segmentation and classification of kidney pathologies from medical images remain a major challenge in computer-aided diagnosis due to complex morphological variations, small lesion sizes, and severe class imbalance. This study introduces DiagNeXt, a novel two-stage deep learning framework designed to overcome these challenges through an integrated use of attention-enhanced ConvNeXt architectures for both segmentation and classification. In the first stage, DiagNeXt-Seg employs a U-Net-based design incorporating Enhanced Convolutional Blocks (ECBs) with spatial attention gates and Atrous Spatial Pyramid Pooling (ASPP) to achieve precise multi-class kidney segmentation. In the second stage, DiagNeXt-Cls utilizes the segmented regions of interest (ROIs) for pathology classification through a hierarchical multi-resolution strategy enhanced by Context-Aware Feature Fusion (CAFF) and Evidential Deep Learning (EDL) for uncertainty estimation. The main contributions of this work include: (1) enhanced ConvNeXt blocks with large-kernel depthwise convolutions optimized for 3D medical imaging, (2) a boundary-aware compound loss combining Dice, cross-entropy, focal, and distance transform terms to improve segmentation precision, (3) attention-guided skip connections preserving fine-grained spatial details, (4) hierarchical multi-scale feature modeling for robust pathology recognition, and (5) a confidence-modulated classification approach integrating segmentation quality metrics for reliable decision-making. Extensive experiments on a large kidney CT dataset comprising 3847 patients demonstrate that DiagNeXt achieves 98.9% classification accuracy, outperforming state-of-the-art approaches by 6.8%. The framework attains near-perfect AUC scores across all pathology classes (Normal: 1.000, Tumor: 1.000, Cyst: 0.999, Stone: 0.994) while offering clinically interpretable uncertainty maps and attention visualizations. The superior diagnostic accuracy, computational efficiency (6.2× faster inference), and interpretability of DiagNeXt make it a strong candidate for real-world integration into clinical kidney disease diagnosis and treatment planning systems.

  • New
  • Research Article
  • 10.3390/futuretransp5040190
From Prediction to Prevention: Identifying Actionable Crash Factors Through ML and Narrative-Based Sensitivity Testing
  • Dec 4, 2025
  • Future Transportation
  • Mohammad Zana Majidi + 2 more

Crashes on roadways continue to represent a major global public health concern due to high rates of death and injury, underscoring the need for predictive tools that can identify high-risk conditions and guide prevention strategies. This study develops a framework that combines structured crash records and road information with unstructured police narratives to predict injury severity using machine learning and natural language processing (NLP). The dataset is used to train, validate, and test nine models, combining three algorithms (Random Forest, AdaBoost, and XGBoost) with two NLP methods (TF-IDF and Word2Vec). Model performance is evaluated using macro-average F1-scores to address severe class imbalance. Results show that XGBoost with TF-IDF achieves the best performance (macro-F1 = 0.644), demonstrating measurable improvements from incorporating narrative features compared to structured data alone. Beyond prediction, a simulation-based sensitivity analysis is conducted on the top 100 features, identifying 11 variables with the greatest impact on severity outcomes in Kentucky. Seatbelt non-use, occupant entrapment, and impaired driver control emerge as the most influential factors, with simulated improvements leading to notable reductions in fatalities and major injuries. The study introduces a “prediction-to-prevention” framework that links injury severity prediction with simulation-based sensitivity analysis. By integrating structured and narrative crash data, the framework identifies how changes in key behavioral and roadway factors can shift injury outcomes toward less severe levels. These findings highlight the dual contribution of this study: improving predictive accuracy through narrative integration and offering actionable insights to support evidence-based traffic safety interventions.

  • New
  • Research Article
  • 10.3390/jcp5040109
MalVis: Large-Scale Bytecode Visualization Framework for Explainable Android Malware Detection
  • Dec 4, 2025
  • Journal of Cybersecurity and Privacy
  • Saleh J Makkawy + 2 more

As technology advances, developers continually create innovative solutions to enhance smartphone security. However, the rapid spread of Android malware poses significant threats to devices and sensitive data. The Android Operating System (OS)’s open-source nature and Software Development Kit (SDK) availability mainly contribute to this alarming growth. Conventional malware detection methods, such as signature-based, static, and dynamic analysis, face challenges in detecting obfuscated techniques, including encryption, packing, and compression, in malware. Although developers have created several visualization techniques for malware detection using deep learning (DL), they often fail to accurately identify the critical malicious features of malware. This research introduces MalVis, a unified visualization framework that integrates entropy and N-gram analysis to emphasize meaningful structural and anomalous operational patterns within the malware bytecode. By addressing significant limitations of existing visualization methods, such as insufficient feature representation, limited interpretability, small dataset sizes, and restricted data access, MalVis delivers enhanced detection capabilities, particularly for obfuscated and previously unseen (zero-day) malware. The framework leverages the MalVis dataset introduced in this work, a publicly available large-scale dataset comprising more than 1.3 million visual representations in nine malware classes and one benign class. A comprehensive comparative evaluation was performed against existing state-of-the-art visualization techniques using leading convolutional neural network (CNN) architectures, MobileNet-V2, DenseNet201, ResNet50, VGG16, and Inception-V3. To further boost classification performance and mitigate overfitting, the outputs of these models were combined using eight distinct ensemble strategies. To address the issue of imbalanced class distribution in the multiclass dataset, we employed an undersampling technique to ensure balanced learning across all types of malware. MalVis achieved superior results, with 95% accuracy, 90% F1-score, 92% precision, 89% recall, 87% Matthews Correlation Coefficient (MCC), and 98% Receiver Operating Characteristic Area Under Curve (ROC-AUC). These findings highlight the effectiveness of MalVis in providing interpretable and accurate representation features for malware detection and classification, making it valuable for research and real-world security applications.

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • .
  • .
  • .
  • 10
  • 1
  • 2
  • 3
  • 4
  • 5

Popular topics

  • Latest Artificial Intelligence papers
  • Latest Nursing papers
  • Latest Psychology Research papers
  • Latest Sociology Research papers
  • Latest Business Research papers
  • Latest Marketing Research papers
  • Latest Social Research papers
  • Latest Education Research papers
  • Latest Accounting Research papers
  • Latest Mental Health papers
  • Latest Economics papers
  • Latest Education Research papers
  • Latest Climate Change Research papers
  • Latest Mathematics Research papers

Most cited papers

  • Most cited Artificial Intelligence papers
  • Most cited Nursing papers
  • Most cited Psychology Research papers
  • Most cited Sociology Research papers
  • Most cited Business Research papers
  • Most cited Marketing Research papers
  • Most cited Social Research papers
  • Most cited Education Research papers
  • Most cited Accounting Research papers
  • Most cited Mental Health papers
  • Most cited Economics papers
  • Most cited Education Research papers
  • Most cited Climate Change Research papers
  • Most cited Mathematics Research papers

Latest papers from journals

  • Scientific Reports latest papers
  • PLOS ONE latest papers
  • Journal of Clinical Oncology latest papers
  • Nature Communications latest papers
  • BMC Geriatrics latest papers
  • Science of The Total Environment latest papers
  • Medical Physics latest papers
  • Cureus latest papers
  • Cancer Research latest papers
  • Chemosphere latest papers
  • International Journal of Advanced Research in Science latest papers
  • Communication and Technology latest papers

Latest papers from institutions

  • Latest research from French National Centre for Scientific Research
  • Latest research from Chinese Academy of Sciences
  • Latest research from Harvard University
  • Latest research from University of Toronto
  • Latest research from University of Michigan
  • Latest research from University College London
  • Latest research from Stanford University
  • Latest research from The University of Tokyo
  • Latest research from Johns Hopkins University
  • Latest research from University of Washington
  • Latest research from University of Oxford
  • Latest research from University of Cambridge

Popular Collections

  • Research on Reduced Inequalities
  • Research on No Poverty
  • Research on Gender Equality
  • Research on Peace Justice & Strong Institutions
  • Research on Affordable & Clean Energy
  • Research on Quality Education
  • Research on Clean Water & Sanitation
  • Research on COVID-19
  • Research on Monkeypox
  • Research on Medical Specialties
  • Research on Climate Justice
Discovery logo
FacebookTwitterLinkedinInstagram

Download the FREE App

  • Play store Link
  • App store Link
  • Scan QR code to download FREE App

    Scan to download FREE App

  • Google PlayApp Store
FacebookTwitterTwitterInstagram
  • Universities & Institutions
  • Publishers
  • R Discovery PrimeNew
  • Ask R Discovery
  • Blog
  • Accessibility
  • Topics
  • Journals
  • Open Access Papers
  • Year-wise Publications
  • Recently published papers
  • Pre prints
  • Questions
  • FAQs
  • Contact us
Lead the way for us

Your insights are needed to transform us into a better research content provider for researchers.

Share your feedback here.

FacebookTwitterLinkedinInstagram
Cactus Communications logo

Copyright 2025 Cactus Communications. All rights reserved.

Privacy PolicyCookies PolicyTerms of UseCareers