NBA Results Forecast: From League Dynamics Analysis to Predictive Model Implementation
Abstract This study presents a machine learning-based approach to predicting the outcosmes of NBA games, with the aim of enhancing decision-making in sports betting and performance analysis. Using a dataset spanning 20 NBA seasons (2003–2023), we incorporated key features such as team statistics, player performance metrics, and external factors like team fatigue and rankings. The methodology followed the CRISP-DM process, involving data preprocessing, feature selection, and model evaluation. We experimented with multiple classification algorithms, including Logistic Regression, Random Forest, Gradient Boosting, and ensemble methods, to identify the best-performing models. Feature selection techniques such as LASSO and decision tree-based methods were employed to optimize model performance. Our best model, combining team rankings, statistics, and fatigue factors, achieved an accuracy rate of 64.1% and an F1 score of 72.4%, reflecting the complexity of NBA game outcome prediction. The study highlights the importance of key features like team rankings and the challenges posed by the dynamic nature of the NBA. Future research will explore additional qualitative factors, such as emotional states and team dynamics, and employ more advanced machine learning techniques like deep learning to further improve prediction accuracy.
- Research Article
255
- 10.1007/s42452-020-3060-1
- Jun 30, 2020
- SN Applied Sciences
Decision tree-based classifier ensemble methods are a machine learning (ML) technique that combines several tree models to produce an effective or optimum predictive model, and that allows well-predictive performance especially compared to a single model. Thus, selecting a proper ML algorithm help us to understand possible future occurrences by analyzing the past more accurate. The main purpose of this study is to produce landslide susceptibility map of the Ayancik district of Sinop province, situated in the Black Sea region of Turkey using three featured regression tree-based ensemble methods including gradient boosting machines (GBM), extreme gradient boosting (XGBoost), and random forest (RF). Fifteen landslide causative factors and 105 landslide locations occurred in the region were used. The landslide inventory map was randomly divided into training (70%) and testing (30%) dataset to construct the RF, XGBoost and GBM prediction models. Symmetrical uncertainty measure was utilized to determine the most important causative factors, and then the selected features were used to construct susceptibility prediction models. The performance of the ensemble models was validated using different accuracy metrics including Area under the curve (AUC), overall accuracy (OA), Root mean square error (RMSE), and Kappa coefficient. Also, the Wilcoxon signed-rank test was used to assess differences between optimum models. The accuracy results showed that the model of XgBoost_Opt model (the model created by optimum factor combination) has the highest prediction capability (OA = 0.8501 and AUC = 0.8976), followed by the RF_opt (OA = 0.8336 and AUC = 0.8860) and GBM_Opt (OA = 0.8244 and AUC = 0.8796). When the Wilcoxon sign-rank test results were analyzed, XgBoost_Opt model, which is the best subset combinations, were confirmed to be statistically significant considering other models. The results showed that, the XGBoost method according to optimum model achieved lower prediction error and higher accuracy results than the other ensemble methods.
- Research Article
- 10.11591/ijeecs.v38.i2.pp1149-1161
- May 1, 2025
- Indonesian Journal of Electrical Engineering and Computer Science
Accuracy in evaluating the risk of credit applications is crucial for lenders, particularly when dealing with unsecured loans. Accuracy can be enhanced by selecting suitable features for a machine learning model. To better identify high-risk borrowers, this study applies an elaborate feature selection technique. This study uses the light gradient boosting machine (LGBM) Classifier model with boosting type gradient boosting decision tree (GBDT) algorithm and n_estimator value 100 for feature selection process. This work uses advanced machine learning techniques namely stacking to improve accuracy model perform. The dataset consists of 307,506 applicants from European lenders who have applied for loans in Southeast Asia. Each applicant is described by 126 different features. Using GDBT algorithm GBDT, 30 best features were selected based on their maximum accuracy compared to another feature. By employing a stacking technique that combines the LGBM, gradient boosting (GB), and random forest (RF) models, and utilizing logistic regression (LR) as the final estimator, an accuracy of 0.99637 was reached. This study demonstrates an improved the accuracy compared to previous research. This discovery indicates that utilizing feature selection and stacking method can provide one of the most precise choices for modelling the binary class classification among the current models.
- Research Article
- 10.12732/ijam.v38i3s.699
- Oct 13, 2025
- International Journal of Applied Mathematics
This study applies machine learning (ML) techniques to model and predict Assam’s agricultural Gross State Domestic Product (GSDP). Three predictive models—multiple linear regression, random forest regression, and gradient boosting—are evaluated. The random forest model achieved the best fit, exhibiting the highest R² and the lowest mean squared error (MSE) and Akaike information criterion (AIC), along with statistically significant coefficients. Ensemble methods (random forest and gradient boosting) markedly improve forecast accuracy of agricultural growth trends compared to traditional regression, yielding more reliable predictions of productivity and GSDP contributions. The findings underscore the vital role of agricultural productivity in driving economic growth, strengthening GSDP, and supporting food security and employment. Integrating advanced ML techniques with statistical analysis provides insights for policymakers to make data-driven decisions that foster sustainable agricultural development and economic prosperity in Assam. Objectives: Predict Assam’s agricultural sector performance using selected machine learning models. Evaluate and compare the effectiveness of these models in assessing the state’s agricultural economy. Methods: Data preprocessing involved handling outliers (using interquartile range and mean-max scaling) and feature selection via correlation heatmaps. Predictive models (multiple linear regression, random forest regression, and gradient boosting) were implemented in Python. Results: The gradient boosting model emerged as the most effective, achieving the highest accuracy and generalization (testing R² = 0.9867). Farm area, labour, maize yield, and autumn rice yield were the most significant positive contributors to GSDP. The random forest model performed similarly well (R² = 0.9867), while the multiple linear regression model was least accurate (R² = 0.9521), likely due to its inability to capture nonlinear relationships. Conclusions: Machine learning models offer transformative potential for Assam’s agricultural sector. Leveraging data-driven insights from these models can empower policymakers to design targeted interventions, promoting inclusive and sustainable economic growth in the region.
- Research Article
- 10.1108/ijicc-03-2025-0128
- Oct 2, 2025
- International Journal of Intelligent Computing and Cybernetics
Purpose This study aims to enhance cryptocurrency price and trend prediction by applying advanced machine learning (ML) techniques. Given the market’s high volatility and complexity, the research identifies effective models for different conditions, providing insights for investors and risk management. Design/methodology/approach This study proposes a six-stage framework for cryptocurrency price prediction, integrating advanced ML techniques. Data from ten cryptocurrencies are processed, extracting 37 key features, including return, the Fear and Greed Index and various technical indicators. The model employs Deep Q-Networks (DQN), Long Short-Term Memory (LSTM) and multiple regression methods such as linear regression, support vector regression, ridge, LASSO, decision tree, Random Forest, multi-layer perceptron, stochastic gradient descent, elastic net and Bayesian regression. Model performance is evaluated using trading strategies and metrics like accuracy, sensitivity, recall, MSE, MAE and F1-score. Findings The results indicate that complex models like DQN and LSTM excel in volatile markets due to their ability to capture intricate price patterns, whereas simpler models such as linear regression and ridge regression perform better in stable conditions. The multi-layered parallel design enhances computational efficiency, enabling independent asset evaluation. These findings highlight the potential of artificial intelligence in improving prediction accuracy and supporting informed investment decisions. Originality/value This research introduces a novel six-stage ML framework incorporating diverse predictive models and key features for cryptocurrency forecasting. The multi-layered parallel approach enhances computational efficiency, setting this study apart from existing research. The comparative analysis of models offers valuable guidance for investors, traders and financial analysts navigating volatile cryptocurrency markets.
- Research Article
- 10.1108/jes-03-2025-0174
- Sep 19, 2025
- Journal of Economic Studies
Purpose This study constructs a fully balanced panel dataset for 135 countries spanning 2013–2022 to explore the determinants of international trade. It employs classical econometric techniques – Robust Least Squares (RLS), Generalized Linear Model (GLM) and quantile regression – to capture linear effects, heterogeneity and distributional nuances. Complementing these, advanced Machine Learning (ML) methods – including Gradient Boosting Machine (GBM), bagging via Random Forest and an ensemble stacking model – uncover nonlinear relationships and complex interactions. All numeric variables are scaled, and a training/testing split is implemented, ensuring robust performance evaluation through metrics such as MAE, MSE, RMSE and R2. Design/methodology/approach Advanced ML techniques are utilized extensively for both regression and robustness checks. For regression, ML methods such as bagging via Random Forest, boosting and stacking with a meta-learner are employed. Findings Empirical evidence from both econometric and ML analyses reveals that a strong business environment (BE), high-tech exports (HTE), robust ICT services imports (ICTSI) and widespread ICT use (ICTU) significantly promote trade intensity across 135 countries from 2013 to 2022. Quantile regressions indicate that HTE’s positive impact intensifies at higher trade quantiles, whereas persistent underinvestment in R&D (RDC) consistently hampers trade performance. Advanced ML models, particularly GBM and ensemble stacking, further capture nonlinearities and interactions, reinforcing these findings and underscoring the critical role of digital infrastructure and innovation ecosystems in driving global trade competitiveness. Originality/value This study uniquely bridges classical econometrics with state-of-the-art ML to examine the trade–innovation nexus. It harnesses a fully balanced panel of 135 countries (2013–2022) and employs RLS, GLM, quantile regression, alongside advanced ML techniques like gradient boosting, bagging via Random Forest and stacking ensembles. This dual approach not only captures both linear and nonlinear dynamics but also enhances predictive accuracy and model interpretability. The integration of these methods sets a novel benchmark, offering robust, data-driven insights and context-specific policy recommendations that enrich the literature on global trade patterns amid rapid technological advancement.
- Research Article
2
- 10.9734/ajpas/2024/v26i7626
- Jun 11, 2024
- Asian Journal of Probability and Statistics
In the rapidly evolving landscape of retail analytics, the accurate prediction of sales figures holds paramount importance for informed decision-making and operational optimization. Leveraging diverse machine learning methodologies, this study aims to enhance the precision of Walmart sales forecasting, utilizing a comprehensive dataset sourced from Kaggle. Exploratory data analysis reveals intricate patterns and temporal dependencies within the data, prompting the adoption of advanced predictive modeling techniques. Through the implementation of linear regression, ensemble methods such as Random Forest, Gradient Boosting Machines (GBM), eXtreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM), this research endeavors to identify the most effective approach for predicting Walmart sales. Comparative analysis of model performance showcases the superiority of advanced machine learning algorithms over traditional linear models. The results indicate that XGBoost emerges as the optimal predictor for sales forecasting, boasting the lowest Mean Absolute Error (MAE) of 1226.471, Root Mean Squared Error (RMSE) of 1700.981, and an exceptionally high R-squared value of 0.9999900, indicating near-perfect predictive accuracy. This model's performance significantly surpasses that of simpler models such as linear regression, which yielded an MAE of 35632.510 and an RMSE of 80153.858. Insights from bias and fairness measurements underscore the effectiveness of advanced models in mitigating bias and delivering equitable predictions across temporal segments. Our analysis revealed varying levels of bias across different models. Linear Regression, Multiple Regression, and GLM exhibited moderate bias, suggesting some systematic errors in predictions. Decision Tree showed slightly higher bias, while Random Forest demonstrated a unique scenario of negative bias, implying systematic underestimation of predictions. However, models like GBM, XGBoost, and LGB displayed biases closer to zero, indicating more accurate predictions with minimal systematic errors. Notably, the XGBoost model demonstrated the lowest bias, with an MAE of -7.548432 (Table 4), reflecting its superior ability to minimize prediction errors across different conditions. Additionally, fairness analysis revealed that XGBoost maintained robust performance in both holiday and non-holiday periods, with an MAE of 84273.385 for holidays and 1757.721 for non-holidays. Insights from the fairness measurements revealed that Linear Regression, Multiple Regression, and GLM showed consistent predictive performance across both subgroups. Meanwhile, Decision Tree performed similarly for holiday predictions but exhibited better accuracy for non-holiday sales, whereas, Random Forest, XGBoost, GBM, and LGB models displayed lower MAE values for the non-holiday subgroup, indicating potential fairness issues in predicting holiday sales. The study also highlights the importance of model selection and the impact of advanced machine learning techniques on achieving high predictive accuracy and fairness. Ensemble methods like Random Forest and GBM also showed strong performance, with Random Forest achieving an MAE of 12238.782 and an RMSE of 19814.965, and GBM achieving an MAE of 10839.822 and an RMSE of 1700.981. This research emphasizes the significance of leveraging sophisticated analytics tools to navigate the complexities of retail operations and drive strategic decision-making. By utilizing advanced machine learning models, retailers can achieve more accurate sales forecasts, ultimately leading to better inventory management and enhanced operational efficiency. The study reaffirms the transformative potential of data-driven approaches in driving business growth and innovation in the retail sector.
- Research Article
- 10.1002/brb3.70770
- Aug 1, 2025
- Brain and behavior
This study aims to create a reliable and scalable framework for detecting Parkinson's disease (PD) using spiral drawings. It integrates advanced machine learning techniques to improve diagnostic accuracy and practical application in clinical settings. Spiral drawing data were collected from a comprehensive dataset, including samples from both Parkinson's patients and healthy individuals. Three deep learning models-ResNet50, VGG16, and EfficientNetB0-were used to extract detailed patterns from the drawings. To enhance model performance, four feature selection techniques were applied: Principal Component Analysis (PCA), Recursive Feature Elimination (RFE), Least Absolute Shrinkage and Selection Operator (LASSO), and ANOVA. Six different classifiers (Support Vector Machine [SVM], Random Forest [RF], Multi-Layer Perceptron [MLP], XGBoost, CatBoost, and voting classifiers) were tested. The system's diagnostic accuracy was measured using four metrics: accuracy, sensitivity, F1-score, and AUC-ROC. Heatmaps and ROC curves were created to visualize the results. The models achieved high classification performance with different configurations. For example, ResNet50 with PCA and MLP reached the highest accuracy (98%) and AUC-ROC (97%). Similarly, SVM with PCA achieved accuracy (92%) and AUC-ROC (98%). For VGG16, combining LASSO with XGBoost resulted in high F1-scores (90%) and AUC-ROC (93%), while the voting classifiers with PCA achieved an AUC-ROC of 98%. EfficientNetB0 combined with RFE and XGBoost delivered exceptional accuracy (98%) with robust overall metrics. CatBoost with LASSO achieved balanced performance, showing high sensitivity (89%) and AUC-ROC (96%). Ensemble methods, like voting classifiers, consistently provided strong AUC-ROC values but showed variability in accuracy and sensitivity compared to individual classifiers like MLP and SVM. The study demonstrated that combining advanced techniques for feature extraction, selection, and classification can significantly improve PD detection accuracy. Future research should focus on integrating multiple data sources and exploring real-time applications to enhance scalability and clinical utility.
- Research Article
- 10.48175/ijarsct-24418
- Mar 24, 2025
- International Journal of Advanced Research in Science, Communication and Technology
The accurate prediction of crop yield is crucial for effective agricultural planning and food security. This study evaluates the performance of various machine learning algorithms in predicting crop yields, focusing on both traditional statistical methods and advanced machine learning techniques. The research compares models such as Linear Regression, Decision Trees, Random Forests, Support Vector Machines (SVM), and Neural Networks, assessing their accuracy, computational efficiency, and robustness across diverse datasets representing different climatic and geographic conditions. The data used in this study encompasses a wide range of environmental factors, including soil properties, weather conditions, and historical yield data. Feature selection and engineering techniques are applied to enhance model performance, while cross-validation methods ensure the reliability of the results. The evaluation criteria include metrics such as Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and R-squared, providing a comprehensive view of each model's predictive capabilities. Our findings reveal that ensemble methods, particularly Random Forests and Gradient Boosting Machines, outperform other algorithms in terms of accuracy and generalizability. Neural Networks also demonstrate strong predictive power, particularly when large datasets are available, although they require more computational resources and fine-tuning. In contrast, simpler models like Linear Regression and SVMs, while less accurate, offer faster training times and are easier to interpret, making them suitable for scenarios with limited computational resources or when model interpretability is critical..
- Research Article
- 10.59324/stss.2025.2(8).07
- Aug 1, 2025
- Scientia. Technology, Science and Society
Proton exchange membrane fuel cells (PEMFCs) are a major player in the conversion of hydrogen energy and are essential for the realization of an environmentally friendly society. However, their cost and performance have yet to meet the requirements for widespread commercial use or adoption. Hence, this research aims to expand our understanding of PEMFCs performance by investigating the complex association between different operational factors and the real part of impedance (z_real). The principal objective is to predict z_real based on a comprehensive set of input variables, utilizing advanced machine learning techniques. The impedance, representing the fuel cell's opposition to electric current flow, is a complex quantity comprising real and imaginary components for understanding the complex polarization process of PEMFCs, especially from the viewpoint of frequency analysis. Obtaining frequency impedance that shows dynamic losses from signals recorded by sensors without using expensive impedance measuring gear is beneficial. Using this information, the impedance data may be utilized to assess the internal condition of the fuel cell and enhance system control. Unlike existing studies leveraging machine learning for similar predictions, this research introduces a novel dimension by undertaking a rigorous comparative analysis of ensemble techniques. While prior research has applied machine learning to forecast fuel cell behaviour, none have systematically evaluated and compared the performance of diverse ensemble methods in this specific task. Ensemble techniques, known for their capability to enhance predictive accuracy by combining multiple models, offer a promising prospect for achieving more robust predictions of z_real. The methodology employed in this study involves the rigorous exploration of a rich dataset derived from Nafion 112 membrane standard tests and Membrane Electrode Assembly (MEA) activation experiments. The dataset comprises Polarization and Impedance curves, providing a diverse perspective of the fuel cell's response across various pressures of H2/O2 gas, different voltages, and humidity conditions. Leveraging this dataset, the study employs machine learning algorithms, including ensemble methods such as Random Forest, Gradient Boosting, and Bagging, to predict the elusive z_real. The outcomes of this research stretch beyond mere prediction; they incorporate a nuanced understanding of how distinct factors influence the complex impedance behaviour of PEMFCs. Furthermore, the comparative analysis of ensemble techniques focuses to elucidate which method or combination produces the most accurate predictions. This study provides valuable insights not only to the evolving field of fuel cell optimization but also adds a unique perspective to the application of ensemble techniques in predicting critical electrochemical parameters. The discoveries are ready to advance the existing knowledge of PEMFCs dynamics and encourage a more informed approach to enhancing their performance in diverse operational conditions.
- Research Article
- 10.32620/reks.2025.1.06
- Feb 20, 2025
- Radioelectronic and Computer Systems
The subject matter of this article is enhancing credit card fraud detection systems by exploring the impact of oversampling rates and ensemble methods with diverse feature selection techniques. Credit card fraud has become a major issue in the financial world, leading to substantial losses for both financial institutions and consumers. As the volume of credit card transactions continues to grow, accurately detecting fraudulent behavior has become increasingly challenging. The goal of this study is to enhance credit card fraud detection by analyzing oversampling rates to select the optimal one for the highest-performing models and using ensemble techniques based on diverse feature selection approaches. The key tasks undertaken in this study include assessing the models’ performance based on accuracy, recall, and AUC scores, analyzing the effect of oversampling using the Synthetic Minority Over-sampling Technique (SMOTE), and proposing an ensemble method that combines the strengths of different feature selection techniques and classifiers. The methods used in this research involve applying a range of machine learning techniques, including logistic regression, decision trees, random forests, and gradient boosting, to an imbalanced dataset where legitimate transactions significantly outnumber fraudulent ones. To address the data imbalance, the researchers systematically investigated the impact of varying oversampling rates using SMOTE. Additionally, they developed an ensemble model that integrates seven feature selection methods with the eXtreme Gradient Boosting (XGB) algorithm. The results show that the application of SMOTE significantly improves the performance of the machine learning models, with an optimal oversampling rate of 20% identified. The XGB model stood out for its exceptional performance, with high accuracy, recall, and AUC scores. Furthermore, the proposed ensemble approach, which combines the strengths of the diverse feature selection techniques and the XGB classifier, further enhances the detection accuracy and system performance compared to the traditional methods. The conclusions drawn from this research contribute to advancing the field of credit card fraud detection by providing insights into the impact of oversampling and the benefits of ensemble methods with diverse feature selection. These insights can aid in the development of more effective and robust fraud detection systems, helping financial institutions and consumers better protect against the growing threat of credit card fraud.
- Research Article
16
- 10.4018/ijbdah.20210101.oa4
- Dec 7, 2020
- International Journal of Big Data and Analytics in Healthcare
This paper has organized a heart disease-related dataset from UCI repository. The organized dataset describes variables correlations with class-level target variables. This experiment has analyzed the variables by different machine learning algorithms. The authors have considered prediction-based previous work and finds some machine learning algorithms did not properly work or do not cover 100% classification accuracy with overfitting, underfitting, noisy data, residual errors on base level decision tree. This research has used Pearson correlation and chi-square features selection-based algorithms for heart disease attributes correlation strength. The main objective of this research to achieved highest classification accuracy with fewer errors. So, the authors have used parallel and sequential ensemble methods to reduce above drawback in prediction. The parallel and serial ensemble methods were organized by J48 algorithm, reduced error pruning, and decision stump algorithm decision tree-based algorithms. This paper has used random forest ensemble method for parallel randomly selection in prediction and various sequential ensemble methods such as AdaBoost, Gradient Boosting, and XGBoost Meta classifiers. In this paper, the experiment divides into two parts: The first part deals with J48, reduced error pruning and decision stump and generated a random forest ensemble method. This parallel ensemble method calculated high classification accuracy 100% with low error. The second part of the experiment deals with J48, reduced error pruning, and decision stump with three sequential ensemble methods, namely AdaBoostM1, XG Boost, and Gradient Boosting. The XG Boost ensemble method calculated better results or high classification accuracy and low error compare to AdaBoostM1 and Gradient Boosting ensemble methods. The XG Boost ensemble method calculated 98.05% classification accuracy, but random forest ensemble method calculated high classification accuracy 100% with low error.
- Research Article
- 10.62527/joiv.9.2.2888
- Mar 31, 2025
- JOIV : International Journal on Informatics Visualization
This study aims to evaluate the classification accuracy of a video-based system for Timed Up and Go (TUG) subtasks using human pose estimation through MediaPipe. Six participants were included in the validity study, all participating in the reliability study, performing various TUG subtasks. The research methodology involved acquiring video data that captured the participants' movements during the TUG activity. This video data was processed using the MediaPipe package to extract key points from each frame, resulting in a 2D skeletal representation. The dataset was imported in CSV format to train multiple machine learning algorithms. The dataset was partitioned into training data (70%) and test data (30%), and several machine learning models, including Stacking Ensemble, Hist Gradient Boosting, XGBoost, CATBoost, Random Forest, and Gradient Boosting, were evaluated for their effectiveness in classifying TUG subtasks. The evaluation was conducted by comparing the classification accuracy of each model with the posture detection outcomes and overall performance metrics. The results indicated that the Stacking Ensemble method achieved the highest overall accuracy (96.90%), outperforming models such as Hist Gradient Boosting (96.48%), XGBoost (95.63%), CATBoost (96.06%), Random Forest (95.92%), and Gradient Boosting (95.21%). Each classifier was evaluated across sub-activities, and the results consistently demonstrated the superior performance of the Stacking Ensemble. These findings suggest that the video-based system, when combined with advanced machine learning techniques and human pose estimation, is a reliable and accurate tool for measuring and classifying subtask movements in TUG among older adults.
- Research Article
9
- 10.1016/j.jgsce.2023.204916
- Feb 3, 2023
- Gas Science and Engineering
Productivity prediction in the Wolfcamp A and B using weighted voting ensemble machine learning method
- Research Article
- 10.29020/nybg.ejpam.v18i2.6087
- May 1, 2025
- European Journal of Pure and Applied Mathematics
The rapid spread of fake news in the digital age poses significant challenges, necessitating effective detection methods. This study presents a comprehensive evaluation of various ensemble and machine learning classifiers, combined with different feature selection techniques, to improve the accuracy and reliability of the detection of fake news. Using the TruthSeeker dataset, this research examines feature selection methods such as Recursive Feature Elimination (RFE), SelectKBest, Principal Component Analysis (PCA) and Genetic Algorithms (GA), analyzing their impact on model performance. Key metrics such as accuracy, precision, recall, F1 score, and AUC-ROC were used to assess the effectiveness of each classifier. The results reveal that ensemble methods, particularly Random Forest (RF) and Gradient Boosting, demonstrate superior performance, achieving high accuracy and AUC-ROC scores. Moreover, feature selection techniques like RFE and SelectKBest significantly improve model outcomes by optimizing the feature set, while PCA is less effective in this context. This study highlights the importance of integrating robust classifiers with optimal feature selection methods to improve the efficacy of fake news detection systems.
- Research Article
- 10.30574/wjaets.2024.11.1.0048
- Feb 28, 2024
- World Journal of Advanced Engineering Technology and Sciences
With the rapid adoption of cloud computing, securing cloud environments against cyber threats has become a critical challenge. Intrusion Detection Systems (IDS) play a pivotal role in identifying malicious activities, but traditional methods often struggle with the high dimensionality of data and evolving attack patterns in cloud ecosystems. This research proposes a novel approach to improve intrusion detection by leveraging ensemble learning and feature selection techniques. Ensemble learning combines multiple machine learning models to enhance detection accuracy and robustness, while feature selection reduces data dimensionality, improving computational efficiency and model performance. The study evaluates various ensemble methods, such as Random Forest, Gradient Boosting, and Stacking, alongside feature selection algorithms like Recursive Feature Elimination (RFE) and Principal Component Analysis (PCA). Experiments are conducted on benchmark datasets, such as CICIDS2017 and NSL-KDD, to assess the effectiveness of the proposed framework. Results demonstrate that the integration of ensemble learning and feature selection significantly improves detection rates, reduces false positives, and enhances the scalability of IDS in cloud environments. This research contributes to advancing cloud security by providing a robust and efficient intrusion detection framework.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.