Discovery Logo
Sign In
Paper
Search Paper
Cancel
Pricing Sign In
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Chat PDF iconChat PDF Star Left icon
  • Citation Generator iconCitation Generator
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link
  • Paperpal iconPaperpal
    External link
  • Mind the Graph iconMind the Graph
    External link
  • Journal Finder iconJournal Finder
    External link
Discovery Logo menuClose menu
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Chat PDF iconChat PDF Star Left icon
  • Citation Generator iconCitation Generator
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link
  • Paperpal iconPaperpal
    External link
  • Mind the Graph iconMind the Graph
    External link
  • Journal Finder iconJournal Finder
    External link

Related Topics

  • Gradient Boosting Decision Tree
  • Gradient Boosting Decision Tree
  • Stochastic Gradient Boosting
  • Stochastic Gradient Boosting
  • Boosted Trees
  • Boosted Trees

Articles published on Gradient boosting

Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
19505 Search results
Sort by
Recency
  • New
  • Research Article
  • 10.1016/j.jad.2025.120679
Mental health at risk: Predicting psychological distress in Australian youth through machine learning models.
  • Feb 15, 2026
  • Journal of affective disorders
  • Benojir Ahammed + 3 more

Mental health at risk: Predicting psychological distress in Australian youth through machine learning models.

  • New
  • Research Article
  • 10.1159/000550910
A Machine Learning-Based Prognostic Model for Sepsis-Associated Liver Injury Using Routine Indicators.
  • Feb 14, 2026
  • Medical principles and practice : international journal of the Kuwait University, Health Science Centre
  • Wenjun Zhu + 5 more

Sepsis-associated liver injury (SALI) occurs in approximately 40% of sepsis cases and is linked to high mortality, a challenge that may stem from the absence of effective prognostic models. We developed a machine learning (ML)-based prognostic model for SALI using conventional biomarkers to guide precise clinical interventions and reduce mortality. We retrospectively analyzed 307 SALI patients (2010-2024), stratified into favorable (n=139) and poor (n=168) prognosis groups by post-treatment progression. The cohort was randomly split into a training set (80%) and a validation set (20%). The routine biomarkers included hematological indices, liver/renal function parameters, and coagulation profiles. Feature selection used LASSO regression. Nine machine learning algorithms constructed prognostic models: eXtreme Gradient Boosting, Logistic Regression, Light Gradient Boosting Machine, Random Forest, Adaptive Boosting, Gradient Boosting Decision Tree, Gaussian Naive Bayes, and Multilayer Perceptron. Model interpretability was evaluated via the SHapley Additive exPlanation (SHAP) algorithm. An independent cohort of 37 SALI patients was used for external validation. Key parameters influencing SALI prognosis were red blood cell distribution width-coefficient of variation (RDW-CV), anion gap (AG), and high-sensitivity cardiac troponin (hs-cTn). Among the nine models, the Random Forest prognostic model performed best, with an area under the curve (AUC) of 0.816 in the validation set and 0.781 in the external validation. The Random Forest model developed in this study can provide some guidance for clinical decision-making in SALI patients, but further validation is still required and should only be implemented in clinical practice after further research.

  • New
  • Research Article
  • 10.1093/gerona/glag031
Biomarkers help us understand how cellular and systemic aging contribute to mortality: A study utilizing a machine learning approach in the Health and Retirement Study.
  • Feb 14, 2026
  • The journals of gerontology. Series A, Biological sciences and medical sciences
  • Eric T Klopack + 1 more

Research suggests aging is a coordinated physiological decline occurring in multiple systems and at multiple biological levels. However, it is largely unknown how general biological aging and specific systemic aging co-occur and influence one another to affect health outcomes. There is also emerging interest in understanding how social exposures may differentially accelerate decline in individual physiological systems. We utilize data from the Health and Retirement Study, a nationally representative sample of about 4000 US adults over age 55. We used eXtreme Gradient Boosting (xgboost) in a training subsample to create system-specific mortality risk scores based on sets of biomarkers representing biological systems (e.g., brain and nervous system, adaptive immune system, cardiovascular system, renal system) as well as general multisystem aging. Results suggest that the effects of most biological systems may be well captured by one or a small number of biomarkers and that female sex appears to be a protective or risk factor depending on specific biological system. The importance of studying both general and system-specific aging is discussed.

  • New
  • Research Article
  • 10.2196/80156
Research on the Prediction of Coal Workers' Pneumoconiosis Based on Easily Detectable Clinical Data: Machine Learning Model Development and Validation Study.
  • Feb 13, 2026
  • JMIR medical informatics
  • Haiquan Li + 7 more

Coal workers' pneumoconiosis (CWP) is the most prevalent occupational disease that causes irreversible lung damage. Early prediction of CWP is the key to blocking the irreversible process of pulmonary fibrosis. The prediction of CWP based on imaging data and biomarker detection is constrained due to high cost and poor convenience. The study aimed to use easily detectable clinical data to construct a prediction model for CWP through machine learning (ML) methods. A prediction framework was established using a moderate-sized dataset and multidimensional clinical features, including occupational information, lung function parameters, and blood indicators. Six ML algorithms (light gradient boosting machine, random forest, extreme gradient boosting, categorical boosting, support vector machine, and logistic regression) were trained and evaluated using a stratified 5-fold cross-validation and a held-out test set. Hyperparameter optimization was performed using a unified Optuna-based strategy to ensure fair comparison across models. Model interpretability was assessed using Shapley Additive Explanation on top-performing models. In addition, an ablation analysis was conducted by retraining models after excluding job type to assess the independent predictive value of clinical biomarkers. All 6 models achieved consistently high predictive performance, and the differences among the top-performing models were small on the test set. After Optuna-based optimization, light gradient boosting machine and categorical boosting achieved high test-set area under curve values (0.974 and 0.975, respectively), while extreme gradient boosting achieved the highest recall (0.926) and F1-score (0.952). Compared with the baseline models, hyperparameter optimization resulted in only minor performance changes, indicating robust prediction under the current feature set and evaluation protocol. Shapley Additive Explanation analysis consistently identified age, forced expiratory volume/forced vital capacity, and platelet count as key contributors to CWP risk prediction. The ablation analysis further showed that model performance remained strong after removing job type, supporting the independent predictive value of clinical features beyond occupational history. The research results have confirmed the potential of combining simple multidimensional features with ML algorithms for predicting CWP and provided new ideas for early diagnosis and intervention of patients with CWP.

  • New
  • Research Article
  • 10.1126/sciadv.aeb1323
Navigating high-dimensional processing parameters in organic photovoltaics via a multitier machine learning framework.
  • Feb 13, 2026
  • Science advances
  • Yaping Wen + 2 more

Optimizing organic photovoltaic (OPV) performance requires navigating the high-dimensional, interdependent processing parameters governing bulk heterojunction morphology. To address this, we have constructed a standardized database integrating donor/acceptor pairs, nine key fabrication parameters, and device efficiencies, consolidating over a decade of experimental results. Leveraging this resource, we developed a three-tiered machine learning framework using gradient boosting regression trees. The strategy progresses from single-parameter baseline models to stage-combined models that capture intraprocess synergies, culminating in a global nine-parameter optimization model. This final model achieves a Pearson correlation of >0.9 and a success rate of >80% in identifying optimal multiparameter configurations. Validation on 78 external systems, each containing a previously unseen donor or acceptor, demonstrates robust generalization with >75% accuracy in predicting the optimal or secondary condition for individual parameters. This work establishes a practical, data-driven framework for accelerating the rational optimization of OPV photoactive layers.

  • New
  • Research Article
  • 10.1080/14796694.2026.2630630
Radiomics analysis of MRI improves prediction of lymph node metastasis in laryngeal squamous cell carcinoma.
  • Feb 13, 2026
  • Future oncology (London, England)
  • Bingying Li + 6 more

To explore the role of multi-sequence magnetic resonance imaging (MRI) images in preoperative prediction of lymph node metastasis in laryngeal squamous cell carcinoma (LSCC). Patients with LSCC undergoing open surgery and lymph node dissection were enrolled (n = 224 training, n = 96 testing). Radiomic features (n = 2394) were extracted from T1-enhanced and T2-weighted images. Features were screened using least absolute shrinkage and selection operator (LASSO) regression, and the best-performing classification model was identified among Logistic Regression, Random Forest, Extreme Gradient Boosting, and Light Gradient Boosting Machine. An imaging biomarker-based nomogram integrating radiomic and clinical features was developed via logistic regression. LASSO regression identified 14 stable features (6 from T1-enhanced images, 8 from T2-weighted). The Random Forest model showed the best radiomics-only performance (area under the receiver operating characteristic curve [AUC]: 0.877 training; 0.875 testing). The combined clinical - radiomics nomogram achieved higher discrimination (AUC: 0.942 training; 0.908 testing), outperforming standalone clinical or radiomic models. The radiomic-clinical nomogram enhances preoperative prediction of cervical lymph node metastasis in LSCC, offering the potential to optimize clinical decision-making.

  • New
  • Research Article
  • 10.3390/su18041944
Assessing Proxy-Based Grassland Gross Primary Productivity Using Machine Learning Approaches and Multi-Source Remote Sensing
  • Feb 13, 2026
  • Sustainability
  • Tsolmon Sodnomdavaa

Gross Primary Productivity (GPP) in grassland ecosystems is a fundamental eco-biophysical indicator for assessing carbon cycling, grazing capacity, and ecosystem responses to climatic stress. However, robust estimation of GPP in arid and semi-arid rangelands remains challenging because of pronounced spatial heterogeneity, strong climate variability, and inherent uncertainties associated with remotely sensed observations. Together, these factors constrain both modeling performance and out-of-sample generalization beyond the training domain. In this dryland grassland context, this study compares the performance of machine learning (ML) models for grassland GPP proxy-based characterization, downscaling, and predictive agreement using a multivariate dataset that integrates Sentinel-2-derived spectral and phenological features, a Moderate-Resolution Imaging Spectroradiometer (MODIS)-derived GPP proxy, and complementary climatic and geographic information. Pixel-level observations spanning multiple years are analyzed, with ordinary linear regression used as a baseline benchmark and ensemble decision-tree models, including Random Forest, Gradient Boosting, and Histogram-based Gradient Boosting (HGB), compared. Instead of relying solely on random cross-validation, model performance is systematically assessed using a combination of spatially structured validation and a leave-one-year-out scheme to explicitly examine spatial and temporal generalization. The results indicate that ensemble tree-based models outperform linear approaches, with the HGB model showing the strongest agreement with the MODIS-derived GPP proxy (R2 = 0.95, RMSE = 0.035 on the test set) and maintaining stable performance across spatial and temporal validations (R2 = 0.86–0.96 across years). Taken together, the findings demonstrate that integrating multi-source remote sensing data with climatic information within a rigorous validation framework enables a more reliable assessment of model generalization and gap-filling consistency with respect to a remote-sensing-based proxy target, rather than an absolute validation against ground-based measurements, thereby supporting sustainability-relevant monitoring of arid grassland ecosystems.

  • New
  • Research Article
  • 10.3389/frai.2026.1690664
Web-based cardiovascular disease risk prediction using machine learning
  • Feb 13, 2026
  • Frontiers in Artificial Intelligence
  • Suraiya Akhter + 1 more

Cardiovascular disease (CVD) remains the foremost contributor to global illness and death, underscoring the critical need for effective tools that can predict risk at early stages to support preventive care and timely clinical decisions. With the growing complexity of healthcare data, machine learning has shown considerable promise in extracting insights that enhance medical decision-making. Nonetheless, the effectiveness and clarity of machine learning models largely rely on the relevance and quality of input features. In this work, we explored and compared four feature-selection strategies—Pearson correlation + Chi-squared test, Alternating Decision Tree (ADT)-based scoring, Cross-Validated Feature Evaluation (CVFE), and Hypergraph-Based Feature Evaluation (HFE)—to identify the most predictive factors for CVD risk. Our analysis utilized data from the National Health and Nutrition Examination Survey (NHANES), administered by the National Center for Health Statistics under the Centers for Disease Control and Prevention (CDC), encompassing demographic, clinical, laboratory, and survey data collected across the U.S. from August 2021 through August 2023. Distinct sets of features obtained through these selection techniques were used to develop random forest (RF), support vector machine (SVM), and eXtreme Gradient Boosting (XGBoost) models, which were then assessed for predictive effectiveness. To improve clarity and understanding of model decision-making, SHapley Additive exPlanations (SHAP) was used to interpret feature contributions in the top-performing model. Among the evaluated methods, the HFE approach combined with SVM achieved the highest overall accuracy (82.84%) and AUC (0.9027), outperforming both classical and alternative strategies. The most influential predictors included age, total cholesterol, history of high blood pressure, use of cholesterol-lowering medication, recent prescription medication use, lifetime smoking history, family income-to-poverty ratio, gender, educational attainment, and red cell distribution width. The web application, accessible at https://shiny.tricities.wsu.edu/cvdr-prediction/ , presents predictive results, probability scores, and SHAP plots generated from the model trained using the feature set selected by the hypergraph-based approach. This study highlights the importance of strategic feature selection in refining predictive accuracy and interpretability, offering a practical data-driven approach that could aid clinicians in evaluating cardiovascular risk and tailoring preventive care.

  • New
  • Research Article
  • 10.1017/neu.2026.10063
Predicting the need for electroconvulsive therapy via machine learning trained on electronic health record data.
  • Feb 13, 2026
  • Acta neuropsychiatrica
  • Lasse Hansen + 4 more

Electroconvulsive therapy (ECT) is an effective treatment of severe manifestations of mental illness. Since delay in initiation of ECT can have detrimental effects, prediction of the need for ECT could improve outcomes via more timely treatment initiation. Therefore, this study aimed to predict the need for ECT following admission to a psychiatric hospital. This study was based on electronic health record (EHR) data from routine clinical practice. Adult patients admitted to a hospital within the Psychiatric Services of the Central Denmark Region between January 2013 and November 2021 were included in the study. The outcome was initiation of ECT >7 days (to not include patients admitted for planned ECT) and ≤67 days after admission. The data was randomly split into an 85% training set and a 15% test set. On the 7th day of the inpatient stay, machine learning models (extreme gradient boosting) were trained to predict initiation of ECT and subsequently tested on the test set. The cohort consisted of 41,610 patients with 164,961 admissions. In the held out test set, the trained model predicted ECT initiation with an area under the receiver operating characteristic curve of 0.94, 47% sensitivity, 98% specificity, positive predictive value of 24% and negative predictive value of 99%. The top predictors were the highest suicide assessment score and mean Brøset violence checklist score in the preceding three months. EHR data from routine clinical practice may be used to predict need for ECT. This may lead to more timely treatment initiation.

  • New
  • Research Article
  • 10.1785/0120250211
Constraining the Earthquake Focal Depth Distribution in the Southern Korean Peninsula
  • Feb 13, 2026
  • Bulletin of the Seismological Society of America
  • Dong-Hoon Sheen + 2 more

ABSTRACT The accurate determination of earthquake focal depth is critical for understanding regional seismic processes, characterizing seismogenic behavior, and assessing seismic hazard. However, precise focal depth determination remains a significant challenge owing to theoretical limitations, sparse station geometry, uncertainties in velocity models, and errors in arrival-time picking. This study evaluated the uncertainty of earthquake focal depths in the southern Korean Peninsula using seismic station geometry and investigated regional seismogenic characteristics. We selected 304 earthquakes with magnitudes between 2.0 and 4.9 recorded from 2018 to 2022, for which P- and S-wave arrivals were manually picked from dense seismic stations. To quantify uncertainty, we conducted Monte Carlo simulations using synthetic datasets generated with multiple velocity models, random sampling of station locations, and varied initial focal depths. Gradient boosting analysis identified the minimum distance to a station and the number of near-epicentral P and S arrivals as the dominant factors reducing focal depth errors. We propose geometry-based criteria that allow approximately 95% of local crustal events to be located with focal depth errors within 5 km and epicentral errors within 2 km: (1) at least seven stations within 100 km, including two within 50 km and one within 10 km of the epicenter; (2) at least one S wave within 50 km; and (3) primary and secondary azimuthal gaps less than 160° and 220°, respectively. These criteria are region-specific, and further validation is required for application elsewhere. Applying these constraints revealed a bimodal focal depth distribution in the southern Korean Peninsula, with primary concentrations at 5–12 and 12–22 km. Shallow earthquakes occur widely across the southern Korean Peninsula, whereas deeper events are preferentially located near the boundaries of the Okcheon fold belt and in the Gyeongsang basin. Our findings also highlight the need for caution when interpreting offshore focal depths, which may be biased by insufficient station geometry.

  • New
  • Research Article
  • 10.1088/2053-1591/ae45f4
Predictive Analysis and Performance Measures Optimisation in Sustainable Abrasive Jet Machining
  • Feb 13, 2026
  • Materials Research Express
  • Ajit Mohan Gaonkar + 1 more

Abstract Within the United Nations Sustainable Development Goals (SDGs), Goal 9 calls for industries to be made sustainable through increased resource-use efficiency and adoption of environmentally sound processes, while Goal 12 recognises that sustainable manufacturing promotes responsible production and reduces waste. Building on this motivation, the present study demonstrates the feasibility of using unprocessed beach sand as an abrasive for precision through-hole drilling of mild steel in Abrasive Jet Machining (AJM), supported by predictive modelling and multi-objective optimisation. A custom AJM system was developed, and experiments were conducted using a Taguchi L27 orthogonal array to investigate the effects of control parameters on responses: Material Removal Rate (MRR) and Kerf Taper Angle (KTA). Experimental runs revealed stable MRR and reduced KTA, outcomes which were not previously reported for ductile steel using natural abrasives. Random Forest and Extreme Gradient Boosting (XGBoost) models achieved high predictive accuracy (R² > 0.95 for MRR, > 0.80 for KTA), while independent multi-criteria decision-making (MCDM) methods converged on the same optimal parameter set. Scanning electron microscopy (SEM) images at these settings showed sharper edges and improved surface integrity. By replacing conventional abrasives such as silicon carbide and aluminium oxide with locally available beach sand, the work addresses resource efficiency and waste reduction. The integration of sustainable abrasive selection, robust predictive modelling, and decision-driven optimisation may thus present a viable pathway towards greener, high-precision machining of ductile metals.

  • New
  • Research Article
  • 10.3390/agriengineering8020065
Varietal Identification and Yield Estimation in Potatoes Using UAV RGB Imagery in the Southern Highlands of Peru
  • Feb 12, 2026
  • AgriEngineering
  • Miguel Tueros + 9 more

The cultivation of potatoes is essential for rural food security, and the use of Unmanned Aerial Vehicle Red-Green-Blue (UAV-RGB) imagery allows for precise and cost-effective estimation of yield and identification of varieties, overcoming the limitations of manual assessment. We evaluated four INIA varieties (Bicentenario, Canchán, Shulay and Tahuaqueña) by integrating agronomic measurements (height, number and weight of tubers, leaf health) with color and textural indices derived from RGB orthomosaics. Yield prediction was modeled using Random Forest (RF) and Gradient Boosting (GB); varietal identification was approached with (i) a Convolutional Neural Network (CNN) that classifies RGB images and (ii) classical models such as Random Forest, Support Vector Machines (SVMs), K-Nearest Neighbors (KNNs), Decision Trees and Logistic Regression trained on EfficientNetB0 embeddings. The results showed significant genotypic differences in yield (p < 0.001): Tahuaqueña 13.86 ± 0.27 t ha−1 and Bicentenario 6.65 ± 0.27 t ha−1. The number of tubers (r = 0.52) and plant height (r = 0.23) correlated with yield; RGB indices showed low correlations (r < 0.3) and high redundancy (r > 0.9). RF achieved a better fit (Coefficient of determination, R2 = 0.54; Root Mean Square Error, RMSE = 2.72 t ha−1), excelling in stolon development (R2 = 0.66) and losing precision in maturation due to foliar senescence. In classification, the CNN and RF on embeddings achieved F1-macro ≈ 0.69 and 0.66 (Receiver Operating Characteristic—Area Under the Curve, ROC AUC RF = 0.89), with better identification of Bicentenario and Shulay. We conclude that UAV-RGB is a cost-effective alternative for phenotypic monitoring and varietal selection in high Andean contexts. These findings support the integration of UAV-RGB imagery into breeding and monitoring pipelines in resource-limited Andean systems.

  • New
  • Research Article
  • 10.1080/15435075.2026.2628952
Power curve estimation and feature importance quantification for offshore wind turbines based on XGBoost regression
  • Feb 12, 2026
  • International Journal of Green Energy
  • Gürkan Aydemir + 1 more

ABSTRACT Offshore wind power is a critical component of the global transition to renewable energy. However, the accuracy of power curve prediction, essential for both resource assessment and operational monitoring, is significantly hindered by the unique challenges of the marine environment, such as volatile wind conditions and complex nonlinear turbine dynamics. To overcome these limitations, this study presents a novel framework with a twofold methodological contribution. First, a meticulously optimized eXtreme Gradient Boosting (XGBoost) model is developed, establishing a new state-of-the-art performance benchmark for predicting offshore wind power using only standard environmental sensor data. Second, this high-performing model is leveraged to conduct a novel comparative analysis that reveals the fundamentally different feature dependencies of offshore versus inland turbines. This analysis uncovers the distinct environmental drivers crucial for context-specific modeling, an insight previously unexplored in the literature. Validation against real-world data demonstrates the model’s superiority; the proposed XGBoost approach achieved a Root Mean Square Error (RMSE) of 0.07422 for offshore prediction. This represents a significant performance improvement, reducing the error by 4.7% compared to the next-best model, k-Nearest Neighbor regression (kNN, RMSE 0.0777), and by up to 39% compared to the traditional Binning method (RMSE 0.12117). Consequently, the engineering value of this work lies in its dual achievement: it significantly improves the accuracy of power curve modeling for crucial industry tasks while accomplishing this with low-cost, readily available data. This positions the proposed approach as a practical and economically viable tool for enhancing the operational efficiency and reliability of offshore wind farms.

  • New
  • Research Article
  • 10.1038/s41598-026-36424-2
Integrating machine learning and explainable AI for employee attrition prediction in HR analytics
  • Feb 12, 2026
  • Scientific Reports
  • Maytha Al-Ali + 5 more

Employee attrition poses significant challenges to organizations, impacting productivity, morale, and financial stability. Predicting attrition and understanding its underlying drivers are critical for implementing effective retention strategies. In this study, we propose a comprehensive framework that utilizes advanced machine learning techniques to predict employee attrition and job change likelihood. The framework integrates robust preprocessing pipelines, state-of-the-art predictive models, and explainability tools such as SHAP (SHapley Additive exPlanations) to ensure transparency and fairness in HR analytics. By addressing key challenges such as class imbalance, feature selection, and model interpretability, our approach provides actionable insights for proactive talent management. We evaluate the framework on multiple datasets (including the IBM HR Analytics Employee Attrition & Performance dataset and the HR Analytics: Job Change of Data Scientists dataset), achieving near-optimal performance metrics across diverse scenarios. Notably, the Adaptive Boosting (AB) and Histogram Gradient Boosting (HGB) models demonstrate superior performance, with high Precision, Recall, F1-score, and Accuracy. Global and local interpretability analyses using SHAP visualizations reveal critical predictors of attrition, such as OverTime, JobLevel, and JobSatisfaction, enabling targeted interventions. The results underscore the framework’s adaptability, scalability, and potential for real-time deployment in organizational settings. This study contributes to advancing HR analytics by bridging gaps in predictive accuracy, interpretability, and generalizability; offering practical solutions for mitigating employee turnover and safeguarding human capital investments.

  • New
  • Research Article
  • 10.1142/s0218213026400038
Beyond Accuracy: A Comprehensive Comparative Study of Gradient Boosting versus Tabular Deep Learning and Explainability Techniques for Mixed-Type Tabular Data Models Using SHAP and LIME
  • Feb 11, 2026
  • International Journal on Artificial Intelligence Tools
  • Alina Lazar + 2 more

The goal of this study was to evaluate the performance of traditional gradient boosting (GB) and neural network models on diverse tabular datasets that differ in scale, class balance, and feature composition (numerical, categorical, or mixed). We focused on six representative datasets: adult census income, bank marketing, credit card fraud, breast cancer diagnosis, diabetes, and in-vehicle coupon recommendation, each with distinct challenges related to dimensionality, sample size, and heterogeneity. We benchmark the predictive performance of XGBoost and LightGBM (gradient boosting models) against Multilayer Perceptrons (MLP), Tabular Transformers, and tabular prior-data fitted network (TabPFN), using metrics such as accuracy, F1 score, ROC-AUC, and log loss. To ensure transparency and interpretability, we applied SHapley Additive exPlanations (SHAP) and Local Interpretable Model-Agnostic Explanation (LIME) to all models and evaluated the explanation quality using stability, fidelity, and consistency criteria. Our findings confirm that gradient boosting models consistently achieve the best balance of performance, calibration, and interpretability across heterogeneous and imbalanced datasets. SHAP-based insights show that gradient boosting (GB) models provide more stable and interpretable feature attributions, making them well suited for high-stakes domains such as finance and healthcare. These results emphasize the practical advantages of gradient boosting methods for structured data tasks and highlight the interpretability limitations of deep learning models when applied to tabular datasets. Future work will explore hybrid architectures and pretraining strategies to close this performance gap.

  • New
  • Research Article
  • 10.3390/rs18040563
Machine-Learning Crop-Type Mapping Sensitivity to Feature Selection and Hyperparameter Tuning
  • Feb 11, 2026
  • Remote Sensing
  • Mayra Perez-Flores + 9 more

To improve crop yields and incomes, farmers consistently adapt their practices to climate and market fluctuations, resulting in highly variable crop field distribution and coverage in space and time. As these dynamics illustrate farmers’ challenges, up-to-date crop-type mapping is essential for understanding farmers’ needs and supporting their adoption of sustainable practices. With global coverage and frequent temporal observations, remote sensing data are generally integrated into machine learning models to monitor crop dynamics. Unlike physical-based models that rely on straightforward use, implementing machine learning models requires extensive user interaction. In this context, this study assesses how sensitive the models’ outputs are to feature selection and hyperparameter tuning, as both processes rely on user judgment. To achieve this, Sentinel-1 (S1) and Sentinel-2 (S2) features are integrated into five distinct models (Random Forest (RF), Support Vector Machine (SVM), Light Gradient Boosting (LGB), Histogram-based Gradient Boosting (HGB), and Extreme Gradient Boosting (XGB)), considering several features selection (Variance Inflation Factor (VIF) and Sequential Feature Selector (SFS)) and hyperparameter tuning (Grid-Search) setup. Results show that the preprocess modeling feature selection (VIF) discards the features that the wrapped method (SFS) keeps, resulting in less reliable crop-type mapping. Additionally, hyperparameter tuning appears to be sensitive to the input features, and considering it after any feature selection improved the crop-type mapping. In this context a three-step nested modeling setup, including first hyperparameter tuning, followed by a wrapped feature selection (SFS) and additional hyperparameter tuning, leads to the most reliable model outputs. For the study region, LGB and XGB (SVM) are the most (least) suitable models for crop-type mapping, and model reliability improves when integrating S1 and S2 features rather than considering S1 or S2 alone. Finally, crop-type maps are derived across different regions and time periods to highlight the benefits of the proposed method for monitoring crop dynamics in space and time.

  • New
  • Research Article
  • 10.4108/airo.10265
A Stacking Based Ensemble Learning Approach for Accurate Identification of Tumor Homing Peptides in Precision Cancer Therapeutics
  • Feb 11, 2026
  • EAI Endorsed Transactions on AI and Robotics
  • Jahid Hassan Akash + 5 more

The identification of tumor-homing peptides (THPs) plays a pivotal role in the development of targeted cancer therapies and precision medicine. Current THP identification methods still suffer from limited feature representation, moderate predictive performance, and insufficient generalization, highlighting the need for more robust ensemble frameworks. In this study, we propose STHPP, an innovative stacking-based ensemble machine learning approach designed to improve the accuracy and reliability of THP discovery. Two benchmark datasets, referred to as the "main" and "small" datasets of Shoombuatong were collected, merged, and pre-processed in preparation to create a large dataset and then split for training and testing. The STHPP model applies a two-layer ensemble architecture: first layer that aggregates three heterogenous baseline classifiers, Random Forest (RF), Light Gradient Boosting Machine (LightGBM), Extreme Gradient Boosting (XGBoost), and then second layer applies CatBoost as a meta-classifier for post-processing predictive results of the base models. The two-layer architecture uses model diversity and concepts in ensemble learning to enhance generalization performance. The STHPP framework proposed got outstanding performance with accuracy 0.98, precision 0.97, sensitivity 0.99, specificity 0.97, and a Matthews Correlation Coefficient (MCC) of 0.98. These are better than the performances of current state-of-the-art approaches, which illustrates the effectiveness of using the stacking strategy in complicated peptide classification problems. The finding showcases the potential of STHPP as a strong and scalable computational platform for propelling peptide-based drug discovery research and targeted oncology.

  • New
  • Research Article
  • 10.1007/s12094-026-04230-x
Establishment of insulin resistance-related ten-gene signature in endometrial cancer and identification of ACTL8 as a prognostic and immunological biomarker.
  • Feb 11, 2026
  • Clinical & translational oncology : official publication of the Federation of Spanish Oncology Societies and of the National Cancer Institute of Mexico
  • Siyun Lu + 5 more

Endometrial cancer (EC) is a common gynecological tumor. Insulin resistance (IR) increases the risk of EC. However, the common molecular basis between the two remains unclear. This study aims to screen the common differential expression genes (DEGs) between the two diseases and construct a prognostic risk model. We obtained gene expression profiles and clinical information of patients with IR and EC from GEO and TCGA datasets. We performed differential analysis to discover the shared DEGs between IR and EC. Subsequently, the interactions among overlapping DEGs, along with their biological functions and genetic mutations in EC, were comprehensively analyzed via protein-protein interaction (PPI) network, function enrichment analyses, and genetic mutation analyses. Then, machine-learning algorithms were employed to figure out genes significantly associated with survival. For clinical application, we constructed a prognostic risk model and also compared tumor-infiltrating immune cells (TIICs) and genetic mutation between high- and low-risk groups. Finally, we screened one of the most important markers in the prognostic signature to investigate its expression-prognosis pattern, biological function, and underlying mechanism. Our analysis identified 20 co-upregulated genes and 32 co-downregulated genes of IR and EC. In addition, the two subnetworks and the top 20 top genes were obtained through PPI analysis, while the construction of extracellular matrix and immune response were the most enriched functions of DEGs. Filtered by random forest, gradient boosting machine, and extreme gradient boosting, six upregulated markers (ACTL8, WNT7A, CTSV, MMP9, CNIH2, and PLAUR) and four downregulated markers (COL6A6, MYOC, PHLDB1, and FIBIN), were defined as the characteristic genes for the prognosis of EC patients. The risk prediction model constructed by these ten genes had good predictive value in prognosis of EC patients and was related to immune regulation and genetic mutation. ACTL8 was further studied as the most significant marker among 10-gene signature. The correlation between the upregulation of ACTL8 and the poor prognosis of EC patients suggested its carcinogenic effect, which was correlated to its regulation of cilium movement. Our findings suggest that there are common molecular profiles between IR and EC. IR-related prognostic model represents an excellent prognosis predictor and immune-related biomarker, which can be applied to risk stratification and precise treatment of EC patients with IR.

  • New
  • Research Article
  • 10.3390/math14040626
Efficient and Interpretable Machine Learning for Student Academic Outcome Prediction
  • Feb 11, 2026
  • Mathematics
  • Hongwen Gu + 1 more

Understanding and preventing student dropout presents a decision-critical modeling problem involving heterogeneous variables, nonlinear relationships, and the need for transparent inference. This study addresses the prediction of undergraduate academic outcomes, including Graduation, Enrolled, and Dropout, by proposing a efficientand interpretable machine learning framework that explicitly balances predictive performance, feature efficiency, and algorithmic explainability. The empirical analysis relies on a dataset of 4424 student records across 17 undergraduate programs from the Polytechnic Institute of Portalegre, Portugal. In contrast to existing approaches that rely on high-dimensional input spaces and opaque predictive architectures, we develop a reduced-dimensional classification pipeline based on recursive feature elimination with Gradient Boosting and Random Forest models. Starting from a comprehensive set of demographic, academic, and financial indicators, only 20 informative predictors are retained for model construction, substantially reducing input complexity while preserving predictive capacity. Comparative evaluation across multiple learning algorithms identifies Gradient Boosting as the most effective model, achieving an AUC of 0.891. Beyond predictive accuracy, the proposed framework emphasizes model interpretability through the integration of SHapley Additive exPlanations (SHAP), enabling quantitative attribution of feature contributions at both global and instance levels. The analysis reveals that second-semester academic engagement variables—including the number of courses approved, evaluated, and enrolled—as well as tuition fee payment status and age at enrollment, are the dominant factors shaping student outcomes. Overall, the results demonstrate that strong classification performance can be achieved using a compact feature set while maintaining transparent and explainable model behavior. By combining mathematically grounded feature selection with principled model explanation, this study advances methodological understanding of how efficiency, interpretability, and predictive accuracy can be jointly optimized in applied machine learning, with implications for decision-support systems in educational analytics.

  • New
  • Research Article
  • 10.3389/fepid.2026.1696282
Hierarchical forecasting of COVID-19 cases in Africa using machine learning models
  • Feb 11, 2026
  • Frontiers in Epidemiology
  • Claris Shoko + 2 more

Introduction The COVID-19 pandemic posed significant challenges for public health systems, especially in Africa, where data scarcity, inadequate healthcare infrastructure, and regional disparities hindered effective forecasting and response efforts. Conventional forecasting methods have faced challenges in adequately addressing the complexity and detail necessary for effective policy interventions at various administrative levels. This study examines the challenge of producing accurate and coherent forecasts of COVID-19 cases within the hierarchical structure of Africa, which includes the continental, regional, and national levels. Methods To establish a comprehensive forecasting model that uses hierarchical time series forecasting through a bottom-up reconciliation approach augmented by machine learning algorithms. We employ extreme gradient boosting (XGBoost) and random forest models, subsequently improving predictive accuracy via a weighted average ensemble method. We produce forecasts at the national level and then aggregate them to ensure consistency across all hierarchical levels. The models are evaluated in comparison to conventional methods such as ARIMA and exponential smoothing. Results Empirical findings indicate that XGBoost is the best among all the single forecast models used in this study, combining forecasts from the XGBoost with the random forest and assigning more weights to the XGBoost surpasses all other models in the area of mean absolute error, root mean square error, and mean absolute scale error. Results further revealed that Southern Africa, despite its low population density, reported the highest number of cases, indicating underlying health vulnerabilities and socioeconomic factors. In summary, the bottom-up HTSF method, when combined with machine learning, serves as an effective tool for forecasting in environments with limited data availability. Discussion It is advisable to apply similar models to other infectious diseases and to expand their use to guide health interventions, resource allocation, and early warning systems in future pandemics.

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • .
  • .
  • .
  • 10
  • 1
  • 2
  • 3
  • 4
  • 5

Popular topics

  • Latest Artificial Intelligence papers
  • Latest Nursing papers
  • Latest Psychology Research papers
  • Latest Sociology Research papers
  • Latest Business Research papers
  • Latest Marketing Research papers
  • Latest Social Research papers
  • Latest Education Research papers
  • Latest Accounting Research papers
  • Latest Mental Health papers
  • Latest Economics papers
  • Latest Education Research papers
  • Latest Climate Change Research papers
  • Latest Mathematics Research papers

Most cited papers

  • Most cited Artificial Intelligence papers
  • Most cited Nursing papers
  • Most cited Psychology Research papers
  • Most cited Sociology Research papers
  • Most cited Business Research papers
  • Most cited Marketing Research papers
  • Most cited Social Research papers
  • Most cited Education Research papers
  • Most cited Accounting Research papers
  • Most cited Mental Health papers
  • Most cited Economics papers
  • Most cited Education Research papers
  • Most cited Climate Change Research papers
  • Most cited Mathematics Research papers

Latest papers from journals

  • Scientific Reports latest papers
  • PLOS ONE latest papers
  • Journal of Clinical Oncology latest papers
  • Nature Communications latest papers
  • BMC Geriatrics latest papers
  • Science of The Total Environment latest papers
  • Medical Physics latest papers
  • Cureus latest papers
  • Cancer Research latest papers
  • Chemosphere latest papers
  • International Journal of Advanced Research in Science latest papers
  • Communication and Technology latest papers

Latest papers from institutions

  • Latest research from French National Centre for Scientific Research
  • Latest research from Chinese Academy of Sciences
  • Latest research from Harvard University
  • Latest research from University of Toronto
  • Latest research from University of Michigan
  • Latest research from University College London
  • Latest research from Stanford University
  • Latest research from The University of Tokyo
  • Latest research from Johns Hopkins University
  • Latest research from University of Washington
  • Latest research from University of Oxford
  • Latest research from University of Cambridge

Popular Collections

  • Research on Reduced Inequalities
  • Research on No Poverty
  • Research on Gender Equality
  • Research on Peace Justice & Strong Institutions
  • Research on Affordable & Clean Energy
  • Research on Quality Education
  • Research on Clean Water & Sanitation
  • Research on COVID-19
  • Research on Monkeypox
  • Research on Medical Specialties
  • Research on Climate Justice
Discovery logo
FacebookTwitterLinkedinInstagram

Download the FREE App

  • Play store Link
  • App store Link
  • Scan QR code to download FREE App

    Scan to download FREE App

  • Google PlayApp Store
FacebookTwitterTwitterInstagram
  • Universities & Institutions
  • Publishers
  • R Discovery PrimeNew
  • Ask R Discovery
  • Blog
  • Accessibility
  • Topics
  • Journals
  • Open Access Papers
  • Year-wise Publications
  • Recently published papers
  • Pre prints
  • Questions
  • FAQs
  • Contact us
Lead the way for us

Your insights are needed to transform us into a better research content provider for researchers.

Share your feedback here.

FacebookTwitterLinkedinInstagram
Cactus Communications logo

Copyright 2026 Cactus Communications. All rights reserved.

Privacy PolicyCookies PolicyTerms of UseCareers