MACHINE LEARNING BASED CLOUD COMPUTING INTRUSION DETECTION
Based on today’s technologically networked world, a sophisticated networking technology known as Software-Defined Networking (SDN) is utilized in cloud computing environments to improve the effectiveness of network management. However, SDN’s centralized nature makes it vulnerable to DDoS attacks. This study introduces a technique for detecting DDoS attacks within a cloud computing setting. The research seeks to apply an ensemble machine learning approach for statistically identifying DDoS attacks in cloud network traffic, categorizing them as either harmful or harmless. Various machine learning algorithms, including K-Nearest Neighbors, Random Forest (RF), and Decision Tree, were utilized as foundational classifiers in the suggested ensemble machine learning model. A dataset of SDN–DDoS attacks was utilized to assess the efficacy of the base classifiers. The classifiers were trained using 80% of the dataset and evaluated on 20%. The results of the experiment indicated that the Random Forest and Random Forest classifiers attained 100% accuracy, whereas the K-Nearest Neighbor classifier achieved an accuracy of 98.21%. The ensemble machine learning model employed a majority voting technique for final prediction and achieved an accuracy of 100% on the test set, ranking as the best compared to benchmark models.
- Research Article
8
- 10.1007/s44327-025-00095-x
- Jun 2, 2025
- Discover Cities
The rapid increase in population, urbanization, and industrial activity in developing countries is intensifying pressure on groundwater resources, leading to severe water shortages. This study aims to evaluate and compare the predictive capabilities of six ensemble machine learning (ML) models; i.e., Random Forest (RF), AdaBoost, Neural Network, Decision Tree, k-Nearest Neighbors and Extreme Gradient Boosting. For the delineating groundwater potential zones by integrating ML algorithms with Geographic Information System (GIS) tools, offering a novel approach for groundwater resource mapping. Eleven conditioning factors, including elevation, slope, soil types, geomorphology, degree of aspect, rainfall, land use/land cover, stream power index, topographic wetness index, and land surface temperature, were used as input parameters. Model performance was evaluated using multiple metrics, including Area Under the Curve (AUC), Classification Accuracy, F1 Score, Precision, Recall, and Matthews Correlation Coefficient (MCC). The results revealed that RF was the most accurate model AUC (0.91), mapping the largest areas for very high 346 sq. km and low 486 sq. km zones. AdaBoost, effective with imbalanced data, achieved the highest MCC (0.672). Sensitivity analysis revealed that geomorphology, elevation, and rainfall were the most influential parameters for groundwater potential zoning. This study highlights the potential of ensemble ML models in advancing groundwater resource assessment and offers a foundation for further exploration in urban regions facing water scarcity challenges, and identifies priority areas for sustainable water use and planning.
- Research Article
254
- 10.1016/j.cemconcomp.2021.104295
- Oct 13, 2021
- Cement and Concrete Composites
This study aims to provide an efficient and accurate machine learning (ML) approach for predicting the creep behavior of concrete. Three ensemble machine learning (EML) models are selected in this study: Random Forest (RF), Extreme Gradient Boosting Machine (XGBoost) and Light Gradient Boosting Machine (LGBM). Firstly, the creep data in Northwestern University (NU) database is preprocessed by a prebuilt XGBoost model and then split into a training set and a testing set. Then, by Bayesian Optimization and 5-fold cross validation, the 3 EML models are tuned to achieve high accuracy (R2 = 0.953, 0.947 and 0.946 for LGBM, XGBoost and RF, respectively). In the testing set, the EML models show significantly higher accuracy than the equation proposed by the fib Model Code 2010 (R2 = 0.377). Finally, the SHapley Additive exPlanations (SHAP), based on the cooperative game theories, are calculated to interpretate the predictions of the EML model. Five most influential parameters for concrete creep compliance are identified by the SHAP values of EML models as follows: time since loading, compressive strength, age when loads are applied, relative humidity during the test and temperature during the test. The patterns captured by the three EML models are consistent with theoretical understanding of factors that influence concrete creep, which proves that the proposed EML models show reasonable predictions.
- Research Article
181
- 10.1016/j.epsr.2020.106904
- Oct 31, 2020
- Electric Power Systems Research
Ensemble machine learning models for the detection of energy theft
- Research Article
1
- 10.1186/s12933-025-02911-5
- Sep 30, 2025
- Cardiovascular diabetology
Early mortality prediction in critically ill patients with cardiovascular disease remains challenging. This study aimed to develop and validate an ensemble machine learning (ML) model to predict 30-day mortality, comparing its performance with conventional severity scores and interrogating the incremental prognostic value of stress hyperglycemia ratio (SHR). A retrospective cohort of 1,595 ICU patients with cardiovascular disease combined with diabetes (2008-2022) was analyzed. SHR was calculated as admission glucose divided by estimated average glucose (eAG) from HbA1c. Six ML models (eXtreme Gradient Boosting [XGBoost], Decision Tree [DT], Random Forest [RF], Artificial Neural Network [ANN], Logistic Regression [LR], and Support Vector Machine [SVM]) were trained on 80% of the data, with the top three performers combined into an ensemble model. Model performance was evaluated using area under the curve (AUC), precision-recall, calibration, and clinical utility metrics. The 30-day mortality rate was 10.8% in the entire cohort (n = 173). The ensemble model demonstrated superior predictive performance with an AUC of 0.912 (95% CI: 0.888-0.936), outperforming both individual ML models (XGBoost, AUC = 0.903) and traditional scoring systems (APS III/SOFA/SAPS II AUCs ≤ 0.742; all P < 0.001). The top six important predictors included anti-hypertensives, aspirin, blood urea nitrogen (BUN), white blood cell (WBC), age, and red blood cell (RBC), with the Shapley Additive Explanations analysis revealing clinically meaningful patterns: a nonlinear risk escalation for age, linear risk increases with rising BUN and bilirubin levels, a protective effect associated with higher RBC counts, and both low and high WBC levels linked to increased early death risk. While SHR significantly improved the performance of traditional scoring systems (e.g., increasing SOFA AUC from 0.741 to 0.757, P = 0.010), its addition to the ensemble model provided limited incremental benefit (ΔAUC = - 0.032, P = 0.094). External validation in an independent cohort (n = 307) confirmed the model's robustness (AUC = 0.891, 95% CI: 0.864-0.917), with decision curve analysis demonstrating superior clinical utility across a wide range of risk thresholds. The ensemble ML model outperformed conventional prognostic tools in predicting 30-day mortality, with SHR augmenting traditional tools but not the ensemble ML model. This approach offers a reliable, interpretable framework for risk stratification in high-risk cardiovascular patients.
- Research Article
220
- 10.1016/j.conbuildmat.2020.118271
- Feb 17, 2020
- Construction and Building Materials
An ensemble machine learning approach for prediction and optimization of modulus of elasticity of recycled aggregate concrete
- Research Article
31
- 10.1016/j.eswa.2023.119768
- Mar 1, 2023
- Expert Systems with Applications
Ensemble machine learning-based models for estimating the transfer length of strands in PSC beams
- Research Article
83
- 10.1016/j.asr.2023.03.026
- Mar 21, 2023
- Advances in Space Research
Forest fire susceptibility mapping with sensitivity and uncertainty analysis using machine learning and deep learning algorithms
- Research Article
- 10.52783/jisem.v10i30s.4828
- Mar 29, 2025
- Journal of Information Systems Engineering and Management
Introduction: This research article intends to depict the usage of machine learning (ML) techniques in software defined network (SDN) to address the Distributed Denial of Service (DDoS) attack. Due to expansion in the complex network operations and configurations, SDN has come out as a propitious network model which uses software-based controllers or application programming interfaces (APIs) to manage activity in an organization and connect with the basic equipment framework. Unlike traditional systems which use dedicated hardware (such as switches) to control assemble activities, SDN can create and control a virtual organization or traditional equipment, through computer programmes. With SDN, the online intelligence is concentrated in a software component called SDN Pick, giving organization’s admin the ability to effectively manage, protect, and optimize assets as well as programmatically shape the entire organizational activity design. This research comprehensively portrays the usage of ML Algorithms to detect and prevent the DDoS attack. Based on the analysis, to determine the research gaps and opportunities to implement an efficient solution for security in SDN, we summarize the bland system of SDN, identify security problems, find out the optimal solution and provide insights on the long run improvement in this field along with detailed comparison. Objectives: The objective of this paper is to depict the usage of machine learning (ML) techniques in software defined network (SDN) to address the Distributed Denial of Service (DDoS) attack. Algorithms: To evaluate the performance and functionality of the proposed SDN, we have carried out independent experiments using Random Forest (RF) Algorithm, Decision Tree (DT) Algorithm, Naïve Bayes (NB) Algorithm, K- Nearest Neighbors (KNN) Algorithm and Linear Regression. Results: The first scenario performs better at detecting DDoS attack, while other scenarios are more effective at identifying low-frequency attacks. In best scenario using prevention method , over 64.13% of normal data is detected. Additionally, the proposed solution improves the detection and prevention rate of DDoS by 9.67%. Conclusions The subject of SDN had exclusively gotten colossal consideration from industry and the scholarly community. The anticipated commitments of our work are to reply the investigate questions. We carried out an in-depth examination of security applications conveyed in SDN utilizing m innovation and found out that most ponders included in our paper proposed SDN security and Ddos attack mitigation.
- Research Article
1
- 10.32604/cmc.2022.030934
- Jan 1, 2022
- Computers, Materials & Continua
Protein structure prediction is one of the most essential objectives practiced by theoretical chemistry and bioinformatics as it is of a vital importance in medicine, biotechnology and more. Protein secondary structure prediction (PSSP) has a significant role in the prediction of protein tertiary structure, as it bridges the gap between the protein primary sequences and tertiary structure prediction. Protein secondary structures are classified into two categories: 3-state category and 8-state category. Predicting the 3 states and the 8 states of secondary structures from protein sequences are called the Q3 prediction and the Q8 prediction problems, respectively. The 8 classes of secondary structures reveal more precise structural information for a variety of applications than the 3 classes of secondary structures, however, Q8 prediction has been found to be very challenging, that is why all previous work done in PSSP have focused on Q3 prediction. In this paper, we develop an ensemble Machine Learning (ML) approach for Q8 PSSP to explore the performance of ensemble learning algorithms compared to that of individual ML algorithms in Q8 PSSP. The ensemble members considered for constructing the ensemble models are well known classifiers, namely SVM (Support Vector Machines), KNN (K-Nearest Neighbor), DT (Decision Tree), RF (Random Forest), and NB (Naïve Bayes), with two feature extraction techniques, namely LDA (Linear Discriminate Analysis) and PCA (Principal Component Analysis). Experiments have been conducted for evaluating the performance of single models and ensemble models, with PCA and LDA, in Q8 PSSP. The novelty of this paper lies in the introduction of ensemble learning in Q8 PSSP problem. The experimental results confirmed that ensemble ML models are more accurate than individual ML models. They also indicated that features extracted by LDA are more effective than those extracted by PCA.
- Research Article
13
- 10.1080/10298436.2024.2365957
- Jul 10, 2024
- International Journal of Pavement Engineering
Pavement condition prediction helps road agencies to schedule maintenance, rehabilitation, and reconstruction, and to allocate limited funds and resources to such activities. Compared to state highways, pavement performance prediction for local roads has received relatively little attention in the literature due to perceptions of low importance, low levels of investment in data collection, poor data quality, and high variation within the data elements. Additionally, local road pavement condition data may suffer from dataset imbalance, often leading to unreliable condition predictions. Hence, this paper introduces a methodology to predict local pavement condition using various single estimator and ensemble machine learning (ML) models along with the adaptive synthetic sampling method. The study develops nine (9) Bayesian-optimised ML models: category boosting (CatBoost), adaptive boosting, decision tree, extra trees, gradient boosting, light gradient-boosting machine, k-nearest neighbour, random forest, and artificial neural network. The ensemble ML and CatBoost were found to exhibit the best model performance, with an average testing accuracy of 0.82, specificity of 0.81, sensitivity of 0.63, and F-measure of 0.61. These results underscore the efficacy of ensemble ML models in pavement condition prediction. The proposed approach can be beneficial to local road agencies in their long-term planning, scheduling, and budgeting.
- Research Article
3
- 10.1016/j.compbiomed.2025.110008
- May 1, 2025
- Computers in biology and medicine
Machine learning prediction of overall survival in prostate adenocarcinoma using ensemble techniques.
- Research Article
273
- 10.1016/j.cemconres.2020.106164
- Jul 1, 2020
- Cement and Concrete Research
Prediction of surface chloride concentration of marine concrete using ensemble machine learning
- Research Article
- 10.38094/jastt62264
- Aug 8, 2025
- Journal of Applied Science and Technology Trends
Child mortality is a big problem around the world, especially in low- and middle-income nations where there are big differences in health care and social conditions. This investigation seeks to create a predictive model for child mortality and pinpoint the key factors that significantly contribute to it, employing machine learning (ML) methodologies. The dataset includes various features such as parental age, maternal education, birth weight, wealth index, and access to healthcare services. Thirteen machine learning classifiers were used, categorized into four model groups: Traditional Models (Logistic Regression, K-Nearest Neighbors, Support Vector Machine, Naive Bayes), Tree-Based Models (Decision Tree, Random Forest, Extra Trees), Boosting Models (AdaBoost, Gradient Boosting, XGBoost), and Ensemble Learning Models (Soft Voting, Hard Voting, Stacking). The efficacy of each model was assessed using classification metrics, including Accuracy, Precision, Recall, and F1-Score within a 10-fold cross-validation framework to guarantee robustness. Results indicate that ensemble models, particularly AdaBoost, achieved the highest predictive accuracy, with perfect scores across all metrics (1.00). XGBoost and Stacking also demonstrated strong and consistent performance. The findings indicate that ensemble learning methods are effective in predicting child mortality and can assist policymakers and healthcare planners in identifying high-risk populations and implementing targeted interventions to reduce child mortality.
- Research Article
9
- 10.3390/w15101923
- May 19, 2023
- Water
Accurate and reliable discharge estimation plays an important role in water resource management as well as downstream applications such as ecosystem conservation and flood control. Recently, data-driven machine learning (ML) techniques showed seemingly insurmountable performance in runoff forecasting and other geophysical domains, but they still need to be improved in terms of reliability and interpretability. In this study, focusing on discharge estimation and management, we developed an ML-based framework and applied it to the Huitanggou sluice hydrological station in Anhui Province, China. The framework contains two ML algorithms, the ensemble learning random forest (ELRF) and the ensemble learning gradient boosting decision tree (ELGBDT). The SHapley Additive exPlanation (SHAP) was introduced into our framework to interpret the impact of the model features. In our framework, the correlation analysis of the dataset can provide feature information for modeling, and the quartile method was utilized to solve the outlier problem of the dataset. The Bayesian optimization algorithm was adopted to optimize the hyperparameters of the ensemble ML models. The ensemble ML models are further compared with the traditional stage–discharge rating curve (SDRC) method and the single ML model. The results show that the estimation performance of the ensemble ML models is superior to that of the SDRC and the single ML model. In addition, an analysis of the discharge estimation without considering the flow state was performed. This analysis reveals that the ensemble ML models have strong adaptability. The ensemble ML models accurately estimate the discharge, with a coefficient of determination of 0.963, a root mean squared error of 31.268, and a coefficient of correlation of 0.984. Our framework can prove helpful to improve the efficiency of short-term hydrological estimation and simultaneously provide the interpretation of the impact of the hydrological features on estimation results.
- Research Article
15
- 10.1177/20552076231173225
- Jan 1, 2023
- Digital health
Electronic health records provide the opportunity to identify undiagnosed individuals likely to have a given disease using machine learning techniques, and who could then benefit from more medical screening and case finding, reducing the number needed to screen with convenience and healthcare cost savings. Ensemble machine learning models combining multiple prediction estimates into one are often said to provide better predictive performances than non-ensemble models. Yet, to our knowledge, no literature review summarises the use and performances of different types of ensemble machine learning models in the context of medical pre-screening. We aimed to conduct a scoping review of the literature reporting the derivation of ensemble machine learning models for screening of electronic health records. We searched EMBASE and MEDLINE databases across all years applying a formal search strategy using terms related to medical screening, electronic health records and machine learning. Data were collected, analysed, and reported in accordance with the PRISMA scoping review guideline. A total of 3355 articles were retrieved, of which 145 articles met our inclusion criteria and were included in this study. Ensemble machine learning models were increasingly employed across several medical specialties and often outperformed non-ensemble approaches. Ensemble machine learning models with complex combination strategies and heterogeneous classifiers often outperformed other types of ensemble machine learning models but were also less used. Ensemble machine learning models methodologies, processing steps and data sources were often not clearly described. Our work highlights the importance of deriving and comparing the performances of different types of ensemble machine learning models when screening electronic health records and underscores the need for more comprehensive reporting of machine learning methodologies employed in clinical research.