Deep learning ensemble models for CT-based differentiation of malignant and benign sacral bone tumors: development and evaluation.
Radiologists often face challenges in differentiating benign from malignant sacral bone lesions due to their similar imaging characteristics. This study aimed to develop an ensemble deep learning (DL) model that can preoperatively distinguish between benign and malignant sacral tumors using noncontrast computed tomography images. Preoperative sacral CT scans from 569 patients with confirmed sacral lesions were analyzed. Data from Center 1 were utilized in model development and internal test via fivefold cross-validation, and those from Centers 2 and 3 were employed in external test. Various ensemble models combining human-readable interpretation and DL were developed. The diagnostic performance of the models and radiologists was assessed using metrics such as precision, recall, accuracy, area under the curve (AUC), F1 score, and confusion matrix. Furthermore, the clinical benefits derived from radiologists' interpretations and supported by the DL model were evaluated. The ensemble model, which integrates 3D-DenseNet121 with human interpretation, exhibited the most robust performance. The ensemble model demonstrated high performance on the internal and external test sets and achieved AUCs of 0.9139 and 0.8713, F1 scores of 0.9054 and 0.8571, precision of 0.9041 and 0.8824, recall of 0.9136 and 0.8333, and accuracy of 0.8630 and 0.8182, respectively. Across the external test cohort, all radiologists experienced improvements in AUC, accuracy, sensitivity, and specificity. Notably, junior radiologists demonstrated significant improvements compared with senior radiologists. The potential clinical application of the DL model lies in its capacity to considerably enhance the diagnostic efficiency of radiologists. This study presents the first ensemble deep learning model integrating 3D-DenseNet121 with radiologists' interpretation for preoperative differentiation of sacral tumors on noncontrast CT that improved diagnostic performance across all experience levels, particularly for junior radiologists. First artificial intelligence-radiologist ensemble for noncontrast computed tomography (NCCT)-based sacral tumor classification. Boosts all radiologists' performance, with the greatest gains for juniors, potentially reducing referrals. Enables reliable NCCT diagnosis, overcoming contrast/magnetic resonance imaging dependency in musculoskeletal oncology.
- Research Article
1
- 10.1186/s12885-025-15121-9
- Oct 29, 2025
- BMC Cancer
BackgroundOccult pleural dissemination (PD) in non-small cell lung cancer (NSCLC) patients is likely to be missed on computed tomography (CT) scans, associated with poor survival, and generally contraindicated for radical surgery. This study aimed to develop and compare the performance of radiomics-based machine learning (ML), deep learning (DL), and fusion models to preoperatively identify occult PD in NSCLC patients.Materials and methodsA total of 326 NSCLC patients from three Chinese high-volume medical centers (2016–2023) were retrospectively collected and divided into training (n = 216), internal test (n = 54), and external test (n = 56) cohorts. Ten radiomics-based ML models and eight DL models were trained using CT images at the maximum cross-sectional slice of the primary tumor. Moreover, another two fusion models (prefusion and postfusion) were developed using feature-based and decision-based methods. The receiver operating characteristic curve (ROC) and area under the curve (AUC) were mainly used to compare the predictive performance of the models.ResultsThe GBM (AUC: 0.821) and DenseNet121 (AUC: 0.764) models achieved the highest AUC among ML and DL models in the external test cohorts, respectively. The postfusion model, integrating the output probabilities from GBM and DenseNet121 models, showed superior performance (AUC: 0.828–0.978) compared to the prefusion model (AUC: 0.817–0.877). Moreover, the postfusion model demonstrated the highest degree of sensitivity (82.1–97.2%) among all models across the three cohorts.ConclusionsThe postfusion model, which integrates radiomics-based ML and DL models, can serve as a sensitive diagnostic tool to predict occult PD in NSCLC patients, thereby helping to avoid unnecessary surgeries.Supplementary InformationThe online version contains supplementary material available at 10.1186/s12885-025-15121-9.
- Research Article
10
- 10.1016/j.actatropica.2024.107277
- Jun 13, 2024
- Acta Tropica
An emerging network for COVID-19 CT-scan classification using an ensemble deep transfer learning model
- Research Article
- 10.1007/s00261-025-05282-3
- Nov 11, 2025
- Abdominal radiology (New York)
The aim of this study was to investigate the diagnostic performance of the 2.5-dimensional (2.5D) ensemble deep learning (DL) model based on 18F-fuorodeoxyglucose (FDG) positron emission tomography (PET) images in predicting lymphovascular invasion (LVI) in colorectal cancer (CRC) patients. In this retrospective study, 177 CRC patients who underwent preoperative 18F-FDG PET/computed tomography were enrolled and assigned to the training cohort or the internal test cohort. Three inputs were determined according to the manually-delineated tumor volume of interest (VOI): the PrimaryLesion-2.5D input containing only the tumor VOI, the ProximalPeritumoral-2.5D input extending 10mm outward from the VOI boundary, and the DistalPeritumoral-2.5D input extending 20mm outward from the VOI boundary. Five common DL algorithm models, including VGG16, Googlenet, ResNet50, DenseNet201, and Vision Transformer were evaluated. Support vector machine was used to integrate the model outputs with good prediction performance to establish the Fusion model. The Radiomics and Clinical models were constructed for comparative analysis. The performance of the model was statistically analyzed by the area under the curve (AUC), accuracy, F1-score, parameter amount and inference time. The ProximalPeritumoral-DenseNet201 model (training cohort: AUC = 0.840, accuracy = 0.772, F1-score = 0.714; internal test cohort: AUC = 0.738, accuracy = 0.796, F1-score = 0.645; parameter amount = 18.097M, inference time = 47.500ms) and PrimaryLesion-ResNet50 model (training cohort: AUC = 0.746, accuracy = 0.740, F1-score = 0.628; internal test cohort: AUC = 0.733, accuracy = 0.704, F1-score = 0.619; parameter amount = 23.512M, inference time = 38.200ms) achieved an optimal balance between performance and computational efficiency. The performance of the Fusion model combined with the ProximalPeritumoral-DenseNet201 model and PrimaryLesion-ResNet50 model was further improved, with an AUC of 0.874, an accuracy of 0.821, and an F1-score of 0.766 in the training cohort. In the internal test cohort, the AUC was 0.824, the accuracy was 0.815, and the F1-score was 0.722. The Fusion model outperformed the Radiomics and Clinical models. Moreover, it showed good clinicalutility and calibration. The 2.5D ensemble DL model based on 18F-FDG PET images performed well for the prediction of LVI in CRC, proving its potential as a precision medical support tool for CRC patients.
- Research Article
11
- 10.1186/s13244-024-01610-1
- Feb 7, 2024
- Insights into Imaging
ObjectivesTo develop a deep learning (DL) model for differentiating between osteolytic osteosarcoma (OS) and giant cell tumor (GCT) on radiographs.MethodsPatients with osteolytic OS and GCT proven by postoperative pathology were retrospectively recruited from four centers (center A, training and internal testing; centers B, C, and D, external testing). Sixteen radiologists with different experiences in musculoskeletal imaging diagnosis were divided into three groups and participated with or without the DL model’s assistance. DL model was generated using EfficientNet-B6 architecture, and the clinical model was trained using clinical variables. The performance of various models was compared using McNemar’s test.ResultsThree hundred thirty-three patients were included (mean age, 27 years ± 12 [SD]; 186 men). Compared to the clinical model, the DL model achieved a higher area under the curve (AUC) in both the internal (0.97 vs. 0.77, p = 0.008) and external test set (0.97 vs. 0.64, p < 0.001). In the total test set (including the internal and external test sets), the DL model achieved higher accuracy than the junior expert committee (93.1% vs. 72.4%; p < 0.001) and was comparable to the intermediate and senior expert committee (93.1% vs. 88.8%, p = 0.25; 87.1%, p = 0.35). With DL model assistance, the accuracy of the junior expert committee was improved from 72.4% to 91.4% (p = 0.051).ConclusionThe DL model accurately distinguished osteolytic OS and GCT with better performance than the junior radiologists, whose own diagnostic performances were significantly improved with the aid of the model, indicating the potential for the differential diagnosis of the two bone tumors on radiographs.Critical relevance statementThe deep learning model can accurately distinguish osteolytic osteosarcoma and giant cell tumor on radiographs, which may help radiologists improve the diagnostic accuracy of two types of tumors.Key points• The DL model shows robust performance in distinguishing osteolytic osteosarcoma and giant cell tumor.• The diagnosis performance of the DL model is better than junior radiologists’.• The DL model shows potential for differentiating osteolytic osteosarcoma and giant cell tumor.Graphical
- Research Article
- 10.1186/s12944-025-02820-2
- Dec 20, 2025
- Lipids in Health and Disease
Cardiometabolic multimorbidity (CMM) has become an increasing global public health challenge. In China, the prevalence of CMM is rising rapidly among middle-aged and older adults, with estimates ranging from 11.6% to 16.9%, posing a substantial burden on both individuals and healthcare systems. However, effective tools for predicting individual risk of CMM remain limited, hindering timely prevention and intervention. This study used data from the China Health and Retirement Longitudinal Study (CHARLS) between 2011 and 2015, including 7,913 participants aged ≥ 45 years without CMM at baseline. Incident CMM events were identified during the 2015 follow-up based on self-reported diagnoses of cardiometabolic diseases. Ten lipid metabolism biomarkers and derived composite indices (TC, TG, LDL-C, HDL-C, TyG, TyG-BMI, LAP, CTI, non-HDL-C, and RC) were evaluated. Predictive models were estimated using logistic regression, random forest, gradient boosting machine, eXtreme Gradient Boosting (XGBoost), support vector machine, naïve Bayes, deep learning (DL), and an ensemble model. The dataset was randomly split into training (75%) and validation (25%) subsets. Model discrimination was assessed using ROC curves and Area Under the Curve (AUC); calibration was evaluated with calibration plots and Brier scores; classification performance was examined using confusion matrices. Decision curve analysis (DCA) and clinical impact curves (CIC) were applied to assess clinical utility across risk thresholds. Feature importance ranking and SHapley Additive exPlanations (SHAP) were used to quantify variable contributions, marginal effects, and feature interactions. In addition, regional variations in CMM incidence were illustrated using choropleth maps, and correlations between lipid markers and CMM prevalence were analyzed with Pearson coefficients and heatmaps. Over the four-year follow-up, 1,355 participants (17.1%) developed CMM. Compared with controls, incident cases were older, had a higher proportion of women and urban residents, and showed higher BMI. They also had significantly elevated triglycerides (126.6 vs. 101.8 mg/dL), reduced HDL-C (45.2 vs. 50.3 mg/dL, P < 0.001), and increased TyG-BMI and LAP (P < 0.001). Geographical analysis revealed markedly higher CMM incidence in northern cold regions (> 40%) than in southern regions (< 20%). The ensemble model achieved robust predictive performance (AUC = 0.715), followed closely by the DL model (AUC = 0.716) and GBM (AUC = 0.714). These non-linear models consistently outperformed GLM (AUC = 0.696), SVM (AUC = 0.696), and XGBoost (AUC = 0.683). Ensemble, DL, and RF models also demonstrated the best calibration (lowest Brier score, 0.125) and provided the greatest net benefit across risk thresholds. SHAP analysis indicated that composite indices, particularly TyG-BMI, LAP, and TyG, contributed most to risk prediction, whereas HDL-C exerted a protective effect. In contrast, traditional single lipid markers such as LDL-C and TC ranked lower in predictive importance. This study demonstrates that machine learning models incorporating lipid metabolism biomarkers and derived indices can predict the risk of CMM. Composite indicators such as TyG and LAP, which capture insulin resistance and visceral adiposity, showed superior predictive value. DL and ensemble models provided higher discrimination and clinical utility compared with traditional approaches. These models may enable early identification of high-risk individuals, underscoring the importance of lipid and metabolic management in CMM prevention, with potential implications for clinical decision-making and public health strategies.
- Research Article
26
- 10.1007/s11356-022-24065-7
- Nov 12, 2022
- Environmental Science and Pollution Research
This contribution presents a novel methodology based on the feature selection, ensemble deep learning (EDL) models, and active learning (AL) approach for prediction of land subsidence (LS) hazard and rate, and its uncertainty in an area involving two important plains - the Minab and Shamil-Nian plains - in the Hormozgan province, southern Iran. The important features controlling LS hazard were identified by ridge regression. Then, two EDL models were constructed by stacking (SEDL) and voting (VEDL) five dense deep learning (DL) models (model 1 to model 5) for mapping LS hazard. Thereafter, the predictive model performance was assessed by a precision-recall curve and Kolmogorov-Smirnov (KS) plot. A partial dependence plot (PDP), individual conditional expectation plots (ICEP), game theory, and a sensitivity analysis were used for the interpretability of the predictive DL model. According to SEDL - a model with higher accuracy - 34% (1624 km2), 14.7% (698 km2), and 19.2% (912 km2) of the total area were classified as being of very low, low, and moderate hazards, whereas 17.7% (845 km2) and 14.4% (683 km2) of area were classified as being of high and very high hazards, respectively. Based on all interpretability techniques, aquifer loss or groundwater drawdown is the most important feature controlling LS hazard, and it having the greatest impact on the SEDL model output. Based on a Taylor diagram and R2 as model performance assessment indicators, SEDL-AL (with R2 > 95% for training and test datasets) performed better than SEDL for quantify LS rate, the rate of LS ranging between 0 and 48.1cm. The highest rate of LS occurred in the Minab plain - an area located downstream of the Minab Esteghlal dam. SEDL-AL was used to quantify the uncertainty associated with the LS rate. The observed values fell within predictions provided by SEDL-AL, which indicates a high accuracy of our predictive model. Overall, our newly developed modeling techniques are helpful tools for the spatial mapping of LS susceptibility and rate, and its uncertainty.
- Research Article
- 10.62311/nesx/rphcrcscrcp1
- Jun 29, 2025
- International Journal of Academic and Industrial Research Innovations(IJAIRI)
This study investigates the development and evaluation of AI-based intrusion detection systems (IDS) leveraging ensemble deep learning models. In an era of increasing cyber threats, traditional IDS often struggle with high false positive rates and inadequate adaptability to evolving threats. The proposed research integrates Convolutional Neural Networks (CNN), Long Short-Term Memory networks (LSTM), and Autoencoders within an ensemble framework to enhance anomaly detection in network traffic. Using the NSL-KDD dataset, performance metrics such as accuracy, precision, recall, F1-score, and Area Under the Curve (AUC) were analyzed. The ensemble model demonstrated superior performance compared to individual deep learning models. Statistical techniques including regression analysis and ROC curve evaluation confirmed the ensemble model's robustness. This approach provides a scalable and adaptive solution for securing networks against sophisticated cyberattacks. Keywords Intrusion Detection System, Ensemble Deep Learning, Cybersecurity, CNN, LSTM, Autoencoder, Anomaly Detection
- Research Article
2
- 10.21037/gs-2025-50
- Jul 28, 2025
- Gland Surgery
BackgroundCurrent preoperative imaging methods, such as ultrasound, are limited by operator dependency and suboptimal sensitivity for detecting central lymph node metastasis (CLNM). This study aimed to propose a method that integrates deep learning and radiomics to accurately predict lymph node metastasis in thyroid cancer by analyzing intra- and peri-tumoral imaging features, thereby improving the preoperative prediction accuracy.MethodsFrom July 2020 to June 2022, 405 patients diagnosed with PTC were enrolled from two centers: Center 1 (Shanghai Sixth People’s Hospital) with 294 patients divided into a training set (n=294) and an internal validation set, and Center 2 (Tongji Hospital Affiliated to Tongji University) with 111 patients as the external test set. Postoperative pathological confirmation served as the reference standard for CLNM diagnosis. A total of 1,561 radiomics features and 2,048 deep learning features were extracted from intra- and peri-tumoral regions of each ultrasound image. Feature selection was performed using analysis of variance (ANOVA) and least absolute shrinkage and selection operator (LASSO), resulting in the selection of relevant features for constructing support vector machine (SVM) models. Additionally, radiomics-deep learning fusion models were developed by combining selected radiomics and deep learning features.ResultsAmong 405 patients (mean age: 46.59±12.74 years; 68.6% female), 171 exhibited CLNM, highlighting the clinical urgency for accurate prediction. Among the 405 patients, 171 exhibited CLNM. The radiomics models demonstrated area under the curve (AUC) values of 0.760 in internal validation and 0.748 in the external test cohort. The deep learning models demonstrated improved performance with AUCs of 0.794 and 0.756 in the internal and external test sets. Notably, the highest AUC values of 0.897 (internal validation) and 0.881 (external test set) were obtained by the radiomics-deep learning fusion SVM model incorporating both intra- and peri-tumoral regions. DeLong’s test confirmed statistically significant improvements (P<0.05) of the fusion model over the intra-tumoral radiomics model (P=0.008), intra-tumoral deep learning model (P=0.005), and combined intra-tumoral radiomics-deep learning model (P=0.01). However, no significant differences were observed compared to the combined intra- and peri-tumoral deep learning model (P=0.17). Decision curve analysis indicated that the fusion model offers greater clinical utility in predicting CLNM.ConclusionsThe integration of radiomics and deep learning features significantly enhances the diagnostic performance for predicting CLNM in papillary thyroid carcinoma (PTC). The radiomics-deep learning fusion SVM model outperforms individual radiomics and deep learning models, demonstrating substantial potential for clinical application in improving surgical decision-making and patient management. The fusion model could reduce unnecessary central lymph node dissections (CLNDs) and improve surgical planning by providing personalized risk stratification.
- Research Article
1
- 10.1371/journal.pone.0319329
- Apr 16, 2025
- PloS one
Deep learning (DL) has become a powerful tool for the recognition and classification of biological sequences. However, conventional single-architecture models often struggle with suboptimal predictive performance and high computational costs. To address these challenges, we present EnsembleDL-Lipo, an innovative ensemble deep learning framework that combines Convolutional Neural Networks (CNNs) and Deep Neural Networks (DNNs) to enhance the identification of lipocalin sequences. Lipocalins are multifunctional extracellular proteins involved in various diseases and stress responses, and their low sequence similarity and occurrence in the 'twilight zone' of sequence alignment present significant hurdles for accurate classification. These challenges necessitate efficient computational methods to complement traditional, labor-intensive experimental approaches. EnsembleDL-Lipo overcomes these issues by leveraging a set of PSSM-based features to train a large ensemble of deep learning models. The framework integrates multiple feature representations derived from position-specific scoring matrices (PSSMs), optimizing classification performance across diverse sequence patterns. The model achieved superior results on the training dataset, with an accuracy (ACC) of 97.65%, recall of 97.10%, Matthews correlation coefficient (MCC) of 0.95, and area under the curve (AUC) of 0.99. Validation on an independent test set further confirmed the robustness of the model, yielding an ACC of 95.79%, recall of 90.48%, MCC of 0.92, and AUC of 0.97. These results demonstrate that EnsembleDL-Lipo is a highly effective and computationally efficient tool for lipocalin sequence identification, significantly outperforming existing methods and offering strong potential for applications in biomarker discovery.
- Research Article
143
- 10.1158/1078-0432.ccr-19-0374
- Apr 15, 2020
- Clinical Cancer Research
With increasing incidence of renal mass, it is important to make a pretreatment differentiation between benign renal mass and malignant tumor. We aimed to develop a deep learning model that distinguishes benign renal tumors from renal cell carcinoma (RCC) by applying a residual convolutional neural network (ResNet) on routine MR imaging. Preoperative MR images (T2-weighted and T1-postcontrast sequences) of 1,162 renal lesions definitely diagnosed on pathology or imaging in a multicenter cohort were divided into training, validation, and test sets (70:20:10 split). An ensemble model based on ResNet was built combining clinical variables and T1C and T2WI MR images using a bagging classifier to predict renal tumor pathology. Final model performance was compared with expert interpretation and the most optimized radiomics model. Among the 1,162 renal lesions, 655 were malignant and 507 were benign. Compared with a baseline zero rule algorithm, the ensemble deep learning model had a statistically significant higher test accuracy (0.70 vs. 0.56, P = 0.004). Compared with all experts averaged, the ensemble deep learning model had higher test accuracy (0.70 vs. 0.60, P = 0.053), sensitivity (0.92 vs. 0.80, P = 0.017), and specificity (0.41 vs. 0.35, P = 0.450). Compared with the radiomics model, the ensemble deep learning model had higher test accuracy (0.70 vs. 0.62, P = 0.081), sensitivity (0.92 vs. 0.79, P = 0.012), and specificity (0.41 vs. 0.39, P = 0.770). Deep learning can noninvasively distinguish benign renal tumors from RCC using conventional MR imaging in a multi-institutional dataset with good accuracy, sensitivity, and specificity comparable with experts and radiomics.
- Research Article
33
- 10.1016/j.fuel.2021.121975
- Sep 15, 2021
- Fuel
An ensemble deep learning model for exhaust emissions prediction of heavy oil-fired boiler combustion
- Research Article
1
- 10.6000/1929-6029.2025.14.11
- Mar 3, 2025
- International Journal of Statistics in Medical Research
Diagnosing and treating at-risk patients for chronic kidney disease (CKD) relies heavily on accurately classifying the disease. The use of deep learning models in healthcare research is receiving much interest due to recent developments in the field. CKD has many features; however, only some features contribute weightage for the classification task. Therefore, it is required to eliminate the irrelevant feature before applying the classification task. This paper proposed a hybrid feature selection method by combining the two feature selection techniques: the Boruta and the Recursive Feature Elimination (RFE) method. The features are ranked according to their importance for CKD classification using the Boruta algorithm and refined feature set using the RFE, which recursively eliminates the least important features. The hybrid feature selection method removes the feature with a low recursive score. Later, selected features are given input to the proposed ensemble deep learning method for classification. The experimental ensemble deep learning model with feature selection is compared to Support Vector Machine (SVM), Logistic Regression (LR), and Random Forest (RF) models with and without feature selection. When feature selection is used, the ensemble model improves accuracy by 2%. Experimental results found that these features, age, pus cell clumps, bacteria, and coronary artery disease, do not contribute much to accurate classification tasks. Accuracy, precision, and recall are used to evaluate the ensemble deep learning model.
- Research Article
- 10.1292/jvms.24-0518
- Dec 16, 2025
- The Journal of Veterinary Medical Science
The aim of this study was to distinguish canine lymphoma from other diseases,particularly reactive lymphoid hyperplasia (RLH), based on fine needle aspiration (FNA)images. We developed four deep learning models based on Vision Transformer (ViT) andInception-v3, which were pre-trained image classification models. The two models out offour were ViT and Inception-v3, and the remained were the two types of combination, i.e.,ensemble learning models, of ViT and Inception-v3; the mean of class probabilities of ViTand Inception-v3 (Ensemble model A; MEAN) and the maximum probabilities of ViT andInception-v3 (Ensemble model B; MAX). A total of 2,290 FNA images of canine lymphoma and871 FNA images of RLH were analyzed. The FNA images were obtained from the twenty-fiveslides of fourteen lymphoma cases and eight slides of seven RLH cases in two hospitals.Three types of training and test datasets were prepared from the above image datasets forfair evaluation of the models. Three deep learning-based image classification models(Inception-v3 and the two ensemble models) attained high performance of >80% accuracy,recall and area under the curve (AUC) values for all three datasets. ViT did not archivehigh performance, except the precision (>0.85). This study is an example of showingpotentials of deep learning models through image classification problem in caninelymphoma.
- Research Article
9
- 10.3390/w16162233
- Aug 8, 2024
- Water
The potential of generalized deep learning models developed for crop water estimation was examined in the current study. This study was conducted in a semiarid region of India, i.e., Karnataka, with daily climatic data (maximum and minimum air temperatures, maximum and minimum relative humidity, wind speed, sunshine hours, and rainfall) of 44 years (1976–2020) for twelve locations. The Extreme Gradient Boosting (XGBoost), Gradient Boosting (GB), and Random Forest (RF) are three ensemble deep learning models that were developed using all of the climatic data from a single location (Bengaluru) from January 1976 to December 2017 and then immediately applied at eleven different locations (Ballari, Chikmaglur, Chitradurga, Devnagiri, Dharwad, Gadag, Haveri, Koppal, Mandya, Shivmoga, and Tumkuru) without the need for any local calibration. For the test period of January 2018–June 2020, the model’s capacity to estimate the numerical values of crop water requirement (Penman-Monteith (P-M) ETo values) was assessed. The developed ensemble deep learning models were evaluated using the performance criteria of mean absolute error (MAE), average absolute relative error (AARE), coefficient of correlation (r), noise to signal ratio (NS), Nash–Sutcliffe efficiency (ɳ), and weighted standard error of estimate (WSEE). The results indicated that the WSEE values of RF, GB, and XGBoost models for each location were smaller than 1 mm per day, and the model’s effectiveness varied from 96% to 99% across various locations. While all of the deep learning models performed better with respect to the P-M ETo approach, the XGBoost model was able to estimate ETo with greater accuracy than the GB and RF models. The XGBoost model’s strong performance was also indicated by the decreased noise-to-signal ratio. Thus, in this study, a generalized mathematical model for short-term ETo estimates is developed using ensemble deep learning techniques. Because of this type of model’s accuracy in calculating crop water requirements and its ability for generalization, it can be effortlessly integrated with a real-time water management system or an autonomous weather station at the regional level.
- Research Article
1
- 10.1038/s41598-025-16364-z
- Aug 20, 2025
- Scientific Reports
Accurate estimation of the evaporation is of great significance for the management of limited agricultural water resources. However, developing highly accurate and universal data- driven models using time-series analysis methods to achieve precise evaporation estimation remains a challenging. Specifically, integrating meta-heuristic algorithms, ensemble deep learning models, and data preprocessing techniques for evaporation prediction is notably scarce. The aim of this paper was to employ time series analysis methods to develop data-driven model with high accuracy and universality to realize accurate estimation of evaporation. To achieve this purpose, the Convolutional neural network (CNN) was integrated with Bidirectional long short-term memory network (BiLSTM) as main estimating module, and the Sparrow search algorithm (SSA) was employed to search the optimal hyperparameters of CNN-BiLSTM. To overcome the drawback that directly using measured evaporation time series to predict evaporation may lead to large error, the Variational mode decomposition (VMD) was used to extract multiscale traits of evaporation time series, and Whale optimization algorithm (WOA) was adopted to find the optimal parameters of VMD, and a novel hybrid deep learning model WOA-VMD-CNN-SSA-BiLSTM was proposed to estimate the evaporation in the Linze County, China. The estimating performance was evaluated by using the statistical accuracy metrics, including R2, the mean squared error (MSE), the mean absolute error (MAE), the root mean squared error (RMSE), and the mean absolute percentage error (MAPE). The results show that the Sample entropy (SEn) remains 0.0832 when the optimal values of K and a of VMD are 6 and 0.1773, suggesting that VMD optimized by using WOA effectively overcomes the subjectivity in traditional VMD parameter setting and realizes amplitude-dependent feature extraction of evaporation time series in the study area. In addition, the model performance of CNN-SSA-BiLSTM can be significantly improved by coupling CNN-SSA-BiLSTM with WOA-VMD, and the hybrid model WOA-VMD-SSA-CNN-BiLSTM with MSE = 0.1258, RMSE = 0.3547, MAE = 0.2833, and MAPE = 6.17% in testing stage is superior than other hybrid models and ensemble models, which could be highly recommended for estimating evaporation in study area.