Random Forests

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, aaa, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.

Similar Papers
  • Research Article
  • Cite Count Icon 28
  • 10.1097/tp.0000000000002923
Seeing the Forest for the Trees: Random Forest Models for Predicting Survival in Kidney Transplant Recipients.
  • May 1, 2020
  • Transplantation
  • Ruth Sapir-Pichhadze + 1 more

Seeing the Forest for the Trees: Random Forest Models for Predicting Survival in Kidney Transplant Recipients.

  • PDF Download Icon
  • Abstract
  • Cite Count Icon 3
  • 10.1016/j.gaitpost.2022.09.031
Machine Learning Approach to Support the Detection of Parkinson's Disease in IMU-Based Gait Analysis
  • Oct 1, 2022
  • Gait & Posture
  • D Trabassi + 7 more

Recent advances in data analysis and wearable sensors for human movement monitoring can promote objective and accurate clinical evaluation of neurological symptoms and improve outcome measures in clinical trials [1–3]. The aim of this study was to combine modern technique of data analysis and wearable sensors to determine which supervised machine learning (ML) algorithm can most accurately classify people with Parkinson’s disease (pwPD) from speed-matched healthy subjects (HS) based on a selected minimum set of IMU-derived gait features. Twenty-two gait features were extrapolated from the trunk acceleration patterns of 81 pwPD and 80 HS, including spatiotemporal, pelvic kinematics, and acceleration-derived gait stability indexes. After a three-level feature selection procedure, seven gait features were considered for implementing five ML algorithms: support vector machine (SVM), artificial neural network, decision trees (DT), random forest (RF), and K-nearest neighbors. Accuracy, precision, recall, F1 score, AUC and generalization error were calculated. SVM outperformed the other ML algorithms in terms of classification metrics (test accuracy = 0.86; F1 score = 0.85; AUC = 0.85) and generalizability (generalization error = 2.95%) in classifying the gait impairment of pwPD compared with speed-matched healthy subjects, using a selected dataset of gait features based on lower trunk acceleration data. Although significantly lower than SVM, tree-based algorithms revealed good classification performances with low generalization errors (RF: test accuracy= 0.86; F1 score = 0.85; AUC = 0.85), and lower computational demand than SVM. ANN was similar to DT in terms of classification metrics but showed significantly higher generalization error (7.26%) than tree-based algorithms and SVM and higher computational demand than the other ML algorithms. Even though KNN showed the fastest time performance, its classification metrics were the lowest. We proposed a feature selection procedure based on the combination of filter, wrapper, embedded, and domain-specific methods that was effective in lowering the risk of overrepresenting multicollinear gait features in the model, resulting in a lower risk of overfitting in the test performances by increasing the explainability of the results at the same time. Because of their accurate results, their simplicity of understanding, and explanability, DT and RF algorithms could represent useful tools for the comprehension of gait disorders by making clinicians participate in the decision process. This is the first time that the accuracy and generalizability of the most performed ML algorithms in classifying pwPD gait abnormalities based on gait data from a single lumbar-mounted IMU have been compared. The findings of this study could be used to incorporate machine learning algorithms into software that processes gait data from lumbar-mounted IMUs. Future research could focus on finding the best tree- based model for classification and prediction problems in gait analysis.

  • Research Article
  • Cite Count Icon 40
  • 10.1093/bioinformatics/btp640
Pathway analysis using random forests with bivariate node-split for survival outcomes
  • Nov 18, 2009
  • Bioinformatics
  • Herbert Pang + 2 more

There is great interest in pathway-based methods for genomics data analysis in the research community. Although machine learning methods, such as random forests, have been developed to correlate survival outcomes with a set of genes, no study has assessed the abilities of these methods in incorporating pathway information for analyzing microarray data. In general, genes that are identified without incorporating biological knowledge are more difficult to interpret. Correlating pathway-based gene expression with survival outcomes may lead to biologically more meaningful prognosis biomarkers. Thus, a comprehensive study on how these methods perform in a pathway-based setting is warranted. In this article, we describe a pathway-based method using random forests to correlate gene expression data with survival outcomes and introduce a novel bivariate node-splitting random survival forests. The proposed method allows researchers to identify important pathways for predicting patient prognosis and time to disease progression, and discover important genes within those pathways. We compared different implementations of random forests with different split criteria and found that bivariate node-splitting random survival forests with log-rank test is among the best. We also performed simulation studies that showed random forests outperforms several other machine learning algorithms and has comparable results with a newly developed component-wise Cox boosting model. Thus, pathway-based survival analysis using machine learning tools represents a promising approach in dissecting pathways and for generating new biological hypothesis from microarray studies. R package Pwayrfsurvival is available from URL: http://www.duke.edu/~hp44/pwayrfsurvival.htm. Supplementary data are available at Bioinformatics online.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 5
  • 10.3389/fpsyt.2023.1266548
Prediction of patient admission and readmission in adults from a Colombian cohort with bipolar disorder using artificial intelligence.
  • Dec 21, 2023
  • Frontiers in Psychiatry
  • María Alejandra Palacios-Ariza + 8 more

Bipolar disorder (BD) is a chronically progressive mental condition, associated with a reduced quality of life and greater disability. Patient admissions are preventable events with a considerable impact on global functioning and social adjustment. While machine learning (ML) approaches have proven prediction ability in other diseases, little is known about their utility to predict patient admissions in this pathology. To develop prediction models for hospital admission/readmission within 5 years of diagnosis in patients with BD using ML techniques. The study utilized data from patients diagnosed with BD in a major healthcare organization in Colombia. Candidate predictors were selected from Electronic Health Records (EHRs) and included sociodemographic and clinical variables. ML algorithms, including Decision Trees, Random Forests, Logistic Regressions, and Support Vector Machines, were used to predict patient admission or readmission. Survival models, including a penalized Cox Model and Random Survival Forest, were used to predict time to admission and first readmission. Model performance was evaluated using accuracy, precision, recall, F1 score, area under the receiver operating characteristic curve (AUC) and concordance index. The admission dataset included 2,726 BD patients, with 354 admissions, while the readmission dataset included 352 patients, with almost half being readmitted. The best-performing model for predicting admission was the Random Forest, with an accuracy score of 0.951 and an AUC of 0.98. The variables with the greatest predictive power in the Recursive Feature Elimination (RFE) importance analysis were the number of psychiatric emergency visits, the number of outpatient follow-up appointments and age. Survival models showed similar results, with the Random Survival Forest performing best, achieving an AUC of 0.95. However, the prediction models for patient readmission had poorer performance, with the Random Forest model being again the best performer but with an AUC below 0.70. ML models, particularly the Random Forest model, outperformed traditional statistical techniques for admission prediction. However, readmission prediction models had poorer performance. This study demonstrates the potential of ML techniques in improving prediction accuracy for BD patient admissions.

  • Research Article
  • Cite Count Icon 203
  • 10.1002/widm.1114
Mining data with random forests: current options for real‐world applications
  • Dec 23, 2013
  • WIREs Data Mining and Knowledge Discovery
  • Andreas Ziegler + 1 more

Random Forests are fast, flexible, and represent a robust approach to mining high‐dimensional data. They are an extension of classification and regression trees (CART). They perform well even in the presence of a large number of features and a small number of observations. In analogy to CART, random forests can deal with continuous outcome, categorical outcome, and time‐to‐event outcome with censoring. The tree‐building process of random forests implicitly allows for interaction between features and high correlation between features. Approaches are available to measuring variable importance and reducing the number of features. Although random forests perform well in many applications, their theoretical properties are not fully understood. Recently, several articles have provided a better understanding of random forests, and we summarize these findings. We survey different versions of random forests, including random forests for classification, random forests for probability estimation, and random forests for estimating survival data. We discuss the consequences of (1) no selection, (2) random selection, and (3) a combination of deterministic and random selection of features for random forests. Finally, we review a backward elimination and a forward procedure, the determination of trees representing a forest, and the identification of important variables in a random forest. Finally, we provide a brief overview of different areas of application of random forests. WIREs Data Mining Knowl Discov 2014, 4:55–63. doi: 10.1002/widm.1114This article is categorized under: Algorithmic Development > Statistics Application Areas > Data Mining Software Tools Technologies > Classification Technologies > Machine Learning

  • Research Article
  • 10.1371/journal.pone.0318167
Data-driven survival modeling for breast cancer prognostics: A comparative study with machine learning and traditional survival modeling methods.
  • Apr 22, 2025
  • PloS one
  • Theophilus Gyedu Baidoo + 1 more

Background This investigation delves into the potential application of data-driven survival modeling approaches for prognostic assessments of breast cancer survival. The primary objective is to evaluate and compare the ability of machine learning (ML) models and conventional survival analysis techniques, to identify consistent key predictors of breast cancer survival outcomes. Methods This study employs data-driven survival modeling approaches to predict breast cancer survival, including survival-specific methods such as the Cox Proportional Hazards (CPH) model, Random Survival Forests (RSF), and Cox Proportional Deep Neural Networks (DeepSurv), as well as machine learning models like Random Forests (RF), XGBoost, Support Vector Machines (SVM) with an RBF Kernel, and LightGBM. The dataset, sourced from the National Cancer Institute's Surveillance, Epidemiology, and End Results (SEER) program, comprises 4,024 women diagnosed with infiltrating duct and lobular carcinoma breast cancer between 2006 and 2010. To ensure interpretability across all models, the Shapley Additive Explanation (SHAP) method was applied to RSF, DeepSurv, Random Forests (RF), and XGBoost. This enabled the identification of key predictors influencing breast cancer survival, highlighting consistent factors across models while uncovering unique insights specific to each approach. Results The performance of survival-specific and ML models were evaluated using the Concordance index (C-index), Integrated Brier Score (IBS), mean accuracy, and mean AUC. The CPH model achieved a C-index of 0.71±0.015 and an IBS of 0.08±0.006, while RSF demonstrated slightly better discriminatory power with a C-index of 0.72±0.0117. DeepSurv performed comparably, with a C-index of 0.71±0.0095 and an IBS of 0.09±0.0008. Both Cox and RSF models achieved the lowest IBS (0.08), indicating accurate survival probability predictions over time. For ML models, RF achieved a mean AUC of 0.74±0.0021, and XGBoost with a mean AUC 0.69±0.0183, reflecting fair discriminatory ability but not accounting for censoring in survival data. SHAP analysis for the top-performing models highlighted the extent of lymph node involvement, Regional Node-Positive (number of affected lymph nodes), tumor grade (cell abnormality and growth rate), progesterone status, and age as key predictors of breast cancer survival outcomes. Conclusions While ML models like XGBoost and RF can effectively identify important predictors and patterns in breast cancer outcomes, survival-specific methods such as the Cox model, RSF, and DeepSurv provide essential capabilities for handling time-to-event data and censoring, making them more suitable for accurate survival predictions. The primary objective of including ML models in this analysis was to leverage their interpretability in identifying key variables alongside survival-specific models, rather than to directly compare their performance against survival models. By examining both ML and survival models, this research highlights the complementary strengths of each approach. This study contributes to the integration of artificial intelligence in healthcare, emphasizing the value of data-driven survival modeling techniques in supporting healthcare professionals with accurate, personalized, and actionable insights for high-risk patients. Together, these approaches enhance the precision of survival predictions, paving the way for more informed clinical decision-making and improved patient care.

  • Research Article
  • Cite Count Icon 54
  • 10.1186/s12874-020-01153-1
Survival prediction models since liver transplantation - comparisons between Cox models and machine learning techniques
  • Nov 16, 2020
  • BMC Medical Research Methodology
  • Georgios Kantidakis + 5 more

BackgroundPredicting survival of recipients after liver transplantation is regarded as one of the most important challenges in contemporary medicine. Hence, improving on current prediction models is of great interest.Nowadays, there is a strong discussion in the medical field about machine learning (ML) and whether it has greater potential than traditional regression models when dealing with complex data. Criticism to ML is related to unsuitable performance measures and lack of interpretability which is important for clinicians.MethodsIn this paper, ML techniques such as random forests and neural networks are applied to large data of 62294 patients from the United States with 97 predictors selected on clinical/statistical grounds, over more than 600, to predict survival from transplantation. Of particular interest is also the identification of potential risk factors. A comparison is performed between 3 different Cox models (with all variables, backward selection and LASSO) and 3 machine learning techniques: a random survival forest and 2 partial logistic artificial neural networks (PLANNs). For PLANNs, novel extensions to their original specification are tested. Emphasis is given on the advantages and pitfalls of each method and on the interpretability of the ML techniques.ResultsWell-established predictive measures are employed from the survival field (C-index, Brier score and Integrated Brier Score) and the strongest prognostic factors are identified for each model. Clinical endpoint is overall graft-survival defined as the time between transplantation and the date of graft-failure or death. The random survival forest shows slightly better predictive performance than Cox models based on the C-index. Neural networks show better performance than both Cox models and random survival forest based on the Integrated Brier Score at 10 years.ConclusionIn this work, it is shown that machine learning techniques can be a useful tool for both prediction and interpretation in the survival context. From the ML techniques examined here, PLANN with 1 hidden layer predicts survival probabilities the most accurately, being as calibrated as the Cox model with all variables.Trial registrationRetrospective data were provided by the Scientific Registry of Transplant Recipients under Data Use Agreement number 9477 for analysis of risk factors after liver transplantation.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 66
  • 10.1177/0962280213502437
A comparison of machine learning methods for classification using simulation with multiple real data examples from mental health studies
  • Sep 18, 2013
  • Statistical Methods in Medical Research
  • Mizanur Khondoker + 4 more

BackgroundRecent literature on the comparison of machine learning methods has raised questions about the neutrality, unbiasedness and utility of many comparative studies. Reporting of results on favourable datasets and sampling error in the estimated performance measures based on single samples are thought to be the major sources of bias in such comparisons. Better performance in one or a few instances does not necessarily imply so on an average or on a population level and simulation studies may be a better alternative for objectively comparing the performances of machine learning algorithms.MethodsWe compare the classification performance of a number of important and widely used machine learning algorithms, namely the Random Forests (RF), Support Vector Machines (SVM), Linear Discriminant Analysis (LDA) and k-Nearest Neighbour (kNN). Using massively parallel processing on high-performance supercomputers, we compare the generalisation errors at various combinations of levels of several factors: number of features, training sample size, biological variation, experimental variation, effect size, replication and correlation between features.ResultsFor smaller number of correlated features, number of features not exceeding approximately half the sample size, LDA was found to be the method of choice in terms of average generalisation errors as well as stability (precision) of error estimates. SVM (with RBF kernel) outperforms LDA as well as RF and kNN by a clear margin as the feature set gets larger provided the sample size is not too small (at least 20). The performance of kNN also improves as the number of features grows and outplays that of LDA and RF unless the data variability is too high and/or effect sizes are too small. RF was found to outperform only kNN in some instances where the data are more variable and have smaller effect sizes, in which cases it also provide more stable error estimates than kNN and LDA. Applications to a number of real datasets supported the findings from the simulation study.

  • Conference Article
  • Cite Count Icon 16
  • 10.1109/chicc.2016.7554310
Terrain classification in field environment based on Random Forest for the mobile robot
  • Jul 1, 2016
  • Hui Zhang + 3 more

The inherent topographical diversity of field environment makes it difficult to evaluate the traversability of the terrain for mobile robot navigation. In order to guarantee the real-time performance and adaptability of the terrain classification process, we propose a novel terrain classification method based on Random Forest. This method firstly extracts massive candidate features including color, texture and geometric ones, from which a small subset of features with highly relevancy to the specific type of terrain can be then effectively picked out using our well-designed Random Forest-based online feature selection algorithm. This algorithm is introduced to serve as the cornerstone of our classification method exploiting the trait that the importance of the feature variables and generalization error can be calculated during the training process of the random forest classifier. Following that the selected feature subset is used to train a random forest classifier for evaluating the traversability of the terrain. The experimental results show that our feature selection method based on Random Forest can effectively extract the feature subset highly relevant to terrain leading to the proposed classification algorithm achieving high accuracy and ideal classification speed.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 59
  • 10.3390/rs12132110
Leaf Area Index Estimation Algorithm for GF-5 Hyperspectral Data Based on Different Feature Selection and Machine Learning Methods
  • Jul 1, 2020
  • Remote Sensing
  • Zhulin Chen + 10 more

Leaf area index (LAI) is an essential vegetation parameter that represents the light energy utilization and vegetation canopy structure. As the only in-operation hyperspectral satellite launched by China, GF-5 is potentially useful for accurate LAI estimation. However, there is no research focus on evaluating GF-5 data for LAI estimation. Hyperspectral remote sensing data contains abundant information about the reflective characteristics of vegetation canopies, but these abound data also easily result in a dimensionality curse. Therefore, feature selection (FS) is necessary to reduce data redundancy to achieve more reliable estimations. Currently, machine learning (ML) algorithms have been widely used for FS. Moreover, the same ML algorithm is usually conducted for both FS and regression in LAI estimation. However, no evidence suggests that this is the optimal solution. Therefore, this study focuses on evaluating the capacity of GF-5 spectral reflectance for estimating LAI and the performances of different combination of FS and ML algorithms. Firstly, the PROSAIL model, which coupled leaf optical properties model PROSPECT and the scattering by arbitrarily inclined leaves (SAIL) model, was used to generate simulated GF-5 reflectance data under different vegetation and soil conditions, and then three FS methods, including random forest (RF), K-means clustering (K-means) and mean impact value (MIV), and three ML algorithms, including random forest regression (RFR), back propagation neural network (BPNN) and K-nearest neighbor (KNN) were used to develop nine LAI estimation models. The FS process was conducted twice using different strategies: Firstly, three FS methods were conducted to search the lowest dimension number, which maintained the estimation accuracy of all bands. Then, the sequential backward selection (SBS) method was used to eliminate the bands having minimal impact on LAI estimation accuracy. Finally, three best estimation models were selected and evaluated using reference LAI. The results showed that although the RF_RFR model (RF used for feature selection and RFR used for regression) achieved reliable LAI estimates (coefficient of determination (R2) = 0.828, root mean square error (RMSE) = 0.839), the poor performance (R2 = 0.763, RMSE = 0.987) of the MIV_BPNN model (MIV used for feature selection and BPNN used for regression) suggested using feature selection and regression conducted by the same ML algorithm could not always ensure an optimal estimation. Moreover, RF selection preserved the most informative bands for LAI estimation so that each ML regression method could achieve satisfactory estimation results. Finally, the results indicated that the RF_KNN model (RF used as feature selection and KNN used for regression) with seven GF-5 spectral band reflectance achieved the better estimation results than others when validated by simulated data (R2 = 0.834, RMSE = 0.824) and actual reference LAI (R2 = 0.659, RMSE = 0.697).

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 4
  • 10.1109/access.2022.3233194
Random Interaction Forest (RIF)–A Novel Machine Learning Strategy Accounting for Feature Interaction
  • Jan 1, 2023
  • IEEE Access
  • Chao-Yu Guo + 1 more

If an interaction exists in medical and health sciences, a proper statistical approach is required to avoid an erroneous conclusion. For example, different genders may introduce modified therapeutic effects of drugs, or an adverse interaction between two medicines changes the pharmacological activity, reduces the therapeutic effect, or induces toxicity. Therefore, if the analysis does not account for the impact of the interaction, it may introduce significant prediction errors or bias. Regression models deal with a two-way interaction by adding the product of the two interactive variables. Since machine learning models demonstrate a superior predictive ability to regression models, this study proposes a new method based on the random forest to account for interaction, called random interaction forest (RIF). This new strategy modifies the structure of the random forest, where the interaction features are forced to be in the first two nodes. Simulation studies examined the predictive ability of the linear regression model, logistic regression model, random forest, and the RIF under various scenarios. The results showed that the RIF consistently outperforms random forest and logistic regression when interactions are present. The RIF also performs better in many scenarios than the linear regression model. When the effect of interaction is more significant, the performance of RIF could be superior.

  • Research Article
  • Cite Count Icon 15
  • 10.1109/access.2017.2656618
Cooperative Profit Random Forests With Application in Ocean Front Recognition
  • Jan 1, 2017
  • IEEE Access
  • Jianyuan Sun + 4 more

Random Forests are powerful classification and regression tools that are commonly applied in machine learning and image processing. In the majority of random classification forests algorithms, the Gini index and the information gain ratio are commonly used for node splitting. However, these two kinds of node-split methods may pay less attention to the intrinsic structure of the attribute variables and fail to find attributes with strong discriminate ability as a group yet weak as individuals. In this paper, we propose an innovative method for splitting the tree nodes based on the cooperative game theory, from which some attributes with good discriminate ability as a group can be learned. This new random forests algorithm is called Cooperative Profit Random Forests (CPRF). Experimental comparisons with several other existing random classification forests algorithms are carried out on several real-world data sets, including remote sensing images. The results show that CPRF outperforms other existing Random Forests algorithms in most cases. In particular, CPRF achieves promising results in ocean front recognition.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 12
  • 10.17694/bajece.502156
A Meta-Ensemble Classifier Approach: Random Rotation Forest
  • Apr 30, 2019
  • Balkan Journal of Electrical and Computer Engineering
  • Erdal Taşci

Ensemble learning is a popular and intensively studied field in machine learning and pattern recognition to increase the performance of the classification. Random forest is so important for giving fast and effective results. On the other hand, Rotation Forest can get better performance than Random Forest. In this study, we present a meta-ensemble classifier, called Random Rotation Forest to utilize and combine the advantages of two classifiers (e.g. Rotation Forest and Random Forest). In the experimental studies, we use three base learners (namely, J48, REPTree, and Random Forest) and two meta-learners (namely, Bagging and Rotation Forest) for ensemble classification on five datasets in UCI Machine Learning Repository. The experimental results indicate that Random Rotation Forest gives promising results according to base learners and bagging ensemble approaches in terms of accuracy rates, AUC, precision and recall values. Our method can be used for image/pattern recognition and machine learning problems.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 53
  • 10.1371/journal.pone.0208737
Computational prediction of diagnosis and feature selection on mesothelioma patient health records.
  • Jan 10, 2019
  • PLOS ONE
  • Davide Chicco + 1 more

BackgroundMesothelioma is a lung cancer that kills thousands of people worldwide annually, especially those with exposure to asbestos. Diagnosis of mesothelioma in patients often requires time-consuming imaging techniques and biopsies. Machine learning can provide for a more effective, cheaper, and faster patient diagnosis and feature selection from clinical data in patient records.Methods and findingsWe analyzed a dataset of health records of 324 patients having mesothelioma symptoms from Turkey. The patients had prior asbestos exposure and displayed symptoms consistent with mesothelioma. We compared probabilistic neural network, perceptron-based neural network, random forest, one rule, and decision tree classifiers to predict diagnosis of the patient records. We measured classifiers’ performance through standard confusion matrix scores such as Matthews correlation coefficient (MCC). Random forest outperformed all models tried, obtaining MCC = +0.37 on the complete imbalanced dataset and MCC = +0.64 on the under-sampled balanced dataset. We then employed random forest feature selection to identify the two most relevant dataset traits associated with mesothelioma: lung side and platelet count. These two risk factors resulted so predictive, that decision tree focusing on them achieved the second top accuracy on the complete dataset diagnosis prediction (MCC = +0.28), outperforming all other methods and even decision tree itself applied to all features.ConclusionsOur results show that machine learning can predict diagnoses of patients having mesothelioma symptoms with high accuracy, sensitivity, and specificity, in few minutes. Additionally, random forest can efficiently select the most important features of this clinical dataset (lung side and platelet count) in few seconds. The importance of pleural plaques in lung sides and blood platelets in mesothelioma diagnosis indicates that physicians should focus on these two features when reading records of patients with mesothelioma symptoms. Moreover, doctors can exploit our machinery to predict patient diagnosis when only lung side and platelet data are available.

  • Research Article
  • 10.17485/ijst/v17i45.2728
Leveraging Machine and Deep Learning Models for Load Balancing Strategies in Cloud Computing
  • Dec 14, 2024
  • Indian Journal Of Science And Technology
  • C Thilagavathy

Objectives: To evaluate the efficiency of task prediction and resource allocation for load balancing (LB) in the cloud environment using the combined approach like random Forest(RF) for task prediction and Particle Swarm optimization for optimization and Convolutional Neural Networks (PSO-CNN) for resource prediction and allocation. Methods: The ensemble approach in the present study uses Random Forest (RF), a machine learning (ML) model for task prediction and Particle Swarm Optimization (PSO+CNN), a bio-inspired algorithm and Deep Learning (DL) model for optimization and resource allocation. The study employs PSO techniques to optimize CNN in order to address the investigation of algorithmic optimization in DL. The results show that the suggested model outperforms the other models like CNN-LSTM(Long Short-term memory), CNN-GRU(Gated Recurrent Unit), and PSO –SVM(Support Vector Machine) to increase the performance and efficacy of the cloud systems. The experiment is implemented using Python and assessed using Google Cluster dataset that is accessible to the public. Findings: The use of ML and DL techniques are found to be more efficient in cloud infrastructure than the conventional methods. The study examines the performance of the RF, PSO and CNN and the hybrid RF-PSO-CNN models. The accuracy, precision, and F1. Score metrics were used to assess the performance of the classification models. The recommended model RF-PSO-CNN outperforms them with an accuracy of 90% than the contrasted methods like CNN-LSTM, CNN- GRU and PSO-SVM. As a result, both the classification assessment metrics and resource consumption show that the proposed model performs effectively. Novelty: The novel ensemble approach suggests the combined RF-PSO-CNN for LB in cloud Computing. The task predicted by RF is assigned to the resource chosen by PSO and CNN, thereby improving the efficiency of task prediction and resource allocation. Most of the research uses any two ML or DL methods for either predicting the tasks to be scheduled or which resource to allocate. The study uses a combination of the ML (RF) method, bio-inspired algorithm (PSO) and a DL (CNN) model for both task and resource prediction concurrently and it examines the effectiveness of LB in the cloud context. Keywords: Load Balancing (LB), Task scheduling, Resource allocation, Random Forest (RF), Convolutional Neural Networks (CNN), Particle Swarm Optimization (PSO)

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon