Hyperparameter Optimization of Random Forest for 5G Coverage Prediction

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Utilizing of 5G technology has become a major focus in the development of more advanced and efficient telecommunications networks. In this context, 5G coverage prediction becomes an important aspect in network planning to ensure optimal user experience. In this study, we explore the use of Random Forest algorithm to predict 5G coverage, with special emphasis on the hyperparameter optimization process to improve model performance. We conduct experiments with various hyperparameter combinations, including 'max_depth', 'max_features', 'min_samples_leaf', 'min_samples_split', and 'n_estimators', using hyperparameter optimization techniques. The results show that by paying attention to the optimal combination of hyperparameters, we managed to significantly improve the performance of the model. The optimized model produces a Minimum Root Mean Squared Error (RMSE) of 0.6, which is much better than the Random Forest model without hyperparameter optimization which has an RMSE of 1.14. The result of this study confirms the importance of the hyperparameter optimization process in improving the accuracy and consistency of the Random Forest model for 5G coverage prediction. The results have important implications in supporting the development of a successful 5G network infrastructure in the future.

Similar Papers
  • Conference Article
  • Cite Count Icon 1
  • 10.2118/214253-ms
Automated Hyperparameter Optimization of Convolutional Neural Network (CNN) for First-Break (FB) Arrival Picking
  • Mar 13, 2023
  • Mohammed Ayub + 1 more

The Convolutional Neural Network (CNN) has been used successfully to enhance the First-break (FB) automated arrival picking of seismic data. Determining an optimized FB model is challenging as it needs to consider several hyperparameters (HPs) combinations. Tuning the most important HPs manually is infeasible because of a higher number of HP combinations to be tested. Three state-of-the-art automated hyperparameter optimization (HPO) techniques are applied to a CNN model for robust FB arrival picking classification. A CNN model with 4 convolutional (Conv) layers followed by one fully connected (FC) and one output layer is designed to classify the seismic event as FB or non-FB. To control overfitting, dropout (DO), batch normalization are used after every two Conv layers, in addition to only the DO layer after FC. The number and size of kernels, DO rate, Learning rate (Lr), and several neurons in the FC layer are fine-tuned using random search, Bayesian, and Hyper Band HPO techniques. The findings are experimentally evaluated and compared in terms of four performance metrics with respect to classification performance. The five hyperparameters mentioned above are fine-tuned in 13 search spaces for each of the three HPO techniques. From experimental results, applying random search HPO to CNN yields the best accuracy and F1-score of 96.26%, with the best HP combination of 16, 16, 32, and 64 for numbers of kernels in four Conv layers respectively; 2, 2, 2, 5 for the size of kernels in each Conv layer; 0, 0.45, 0.25 for DO rate in each of DO layers; 240 for numbers of neurons in FC layer; and 0.000675 for Lr. In terms of loss on test data, the above combination of HP gives the lowest test loss of 0.1191 among all techniques, making it a robust model. This model outperforms all the other models in terms of precision (96.27%) and recall. Moreover, all HPO models outperformed the baseline in terms of all metrics. The use of DO after Conv layers and FC layers is highly recommended. Moreover, the use of kernel size relatively smaller (i.e. 2) produces the best classification performance. According to the best HP combination results, there is also no harm to use a relatively higher number of neurons in the FC layer than the Conv layer in FB arrival picking classification. The optimal values of Lr range from 0.0001 to 0.000675 depending on the HPO techniques. The model developed in this study improves the accuracy of the auto-picking of FB seismic data and it is anticipated our model to be used more widely in future studies in the processing of seismic data.

  • Research Article
  • Cite Count Icon 1
  • 10.3233/jifs-219376
Hyperparameter optimization approaches to improve the performance of machine learning models for cardiovascular risk prediction
  • Apr 2, 2024
  • Journal of Intelligent & Fuzzy Systems
  • Eduardo Sánchez-Jiménez + 6 more

Machine learning algorithms have been used in diverse areas among applications, including healthcare. However, to fit an effective and optimal machine learning model, the hyperparameters need to be tuned. This process is commonly referred to as Hyperparameter Optimization and comprises several approaches. We combined three Hyperparameter Optimization techniques (Bayesian Optimization, Particle Swarm Optimization, and Genetic Algorithm) with three classifiers (Random Forest, Support Vector Machine, and XGBoost) to identify the best combination of hyperparameters that maximize model performance. We use the Framingham dataset to test the proposal. For classifier performance, the Support Vector Machine obtained the best result in recall (96.40%) and F-score (93.86%), while XGBoost obtained the best result in precision (96.30%) and specificity (96.36%). In the accuracy metric, both classifiers achieved 95%. Bayesian optimization had the best results in terms of accuracy, precision, specificity, and F-score metrics. Both Particle Swarm Optimization and Genetic Algorithm obtained the best result in the recall metric.

  • Research Article
  • Cite Count Icon 352
  • 10.1016/j.enggeo.2020.105972
Assessment of landslide susceptibility mapping based on Bayesian hyperparameter optimization: A comparison between logistic regression and random forest
  • Dec 16, 2020
  • Engineering Geology
  • Deliang Sun + 3 more

Assessment of landslide susceptibility mapping based on Bayesian hyperparameter optimization: A comparison between logistic regression and random forest

  • Research Article
  • Cite Count Icon 57
  • 10.3389/feart.2023.1112105
Prediction of compressive strength of recycled aggregate concrete using machine learning and Bayesian optimization methods
  • Feb 3, 2023
  • Frontiers in Earth Science
  • Xinyi Zhang + 3 more

With the sustainable development of the construction industry, recycled aggregate (RA) has been widely used in concrete preparation to reduce the environmental impact of construction waste. Compressive strength is an essential measure of the performance of recycled aggregate concrete (RAC). In order to understand the correspondence between relevant factors and the compressive strength of recycled concrete and accurately predict the compressive strength of RAC, this paper establishes a model for predicting the compressive strength of RAC using machine learning and hyperparameter optimization techniques. RAC experimental data from published literature as the dataset, extreme gradient boosting (XGBoost), random forest (RF), K-nearest neighbour (KNN), support vector machine regression Support Vector Regression (SVR), and gradient boosted decision tree (GBDT) RAC compressive strength prediction models were developed. The models were validated and compared using correlation coefficients (R2), Root Mean Square Error (RMSE), mean absolute error (MAE), and the gap between the experimental results of the predicted outcomes. In particular, The effects of different hyperparameter optimization techniques (Grid search, Random search, Bayesian optimization-Tree-structured Parzen Estimator, Bayesian optimization- Gaussian Process Regression) on model prediction efficiency and prediction accuracy were investigated. The results show that the optimal combination of hyperparameters can be searched in the shortest time using the Bayesian optimization algorithm based on TPE (Tree-structured Parzen Estimator); the BO-TPE-GBDT RAC compressive strength prediction model has higher prediction accuracy and generalisation ability. This high-performance compressive strength prediction model provides a basis for RAC’s research and practice and a new way to predict the performance of RAC.

  • Research Article
  • Cite Count Icon 6
  • 10.1038/s41598-025-03868-x
Distributed denial of service (DDoS) classification based on random forest model with backward elimination algorithm and grid search algorithm
  • May 30, 2025
  • Scientific Reports
  • Mohamed S Sawah + 4 more

Distributed Denial of Service (DDoS) attacks pose significant threats to network security, disrupting critical services by overwhelming targeted systems with malicious traffic. In this study, a machine learning-based approach is proposed to classify DDoS attacks using multiple classification models, including Random Forest (RF), Naïve Bayes (NB), K-Nearest Neighbors (KNN), Linear Discriminant Analysis (LDA), and Support Vector Machine (SVM). The DDoS-SDN dataset was used for training and evaluation, with feature selection via Backward Elimination (BE) and hyperparameter tuning using Grid Search with 5-fold Cross-Validation (CV = 5). Experimental results demonstrate a significant improvement in classification performance after feature selection and parameter optimization, with RF achieving the highest accuracy of 99.99%. In this study, we propose a machine learning-based classification framework enhanced by feature selection and hyperparameter optimization techniques through employing Recursive Feature Elimination (RFE) and Grid Search .Our model based on Random Forest (RF) achieved a remarkable accuracy of 99.99%, outperforming other baseline classifiers, including Naive Bayes (98.85%), K-Nearest Neighbors (97.90%), Linear Discriminant Analysis (97.10%), and Support Vector Machine (95.70%). In addition to accuracy, the RF model also demonstrated superior F1 score, recall, and precision, each reaching 99.99%. These results validate the effectiveness of our optimization strategy in improving classification performance. The study highlights the effectiveness of feature engineering and model optimization in enhancing DDoS detection accuracy, making machine learning a viable solution for real-time cybersecurity applications.

  • Research Article
  • Cite Count Icon 5
  • 10.1186/s12302-024-00841-9
Gaussian mutation–orca predation algorithm–deep residual shrinkage network (DRSN)–temporal convolutional network (TCN)–random forest model: an advanced machine learning model for predicting monthly rainfall and filtering irrelevant data
  • Jan 12, 2024
  • Environmental Sciences Europe
  • Mohammad Ehteram + 3 more

Monitoring water resources requires accurate predictions of rainfall data. Our study introduces a novel deep learning model named the deep residual shrinkage network (DRSN)—temporal convolutional network (TCN) to remove redundant features and extract temporal features from rainfall data. The TCN model extracts temporal features, and the DRSN enhances the quality of the extracted features. Then, the DRSN–TCN is coupled with a random forest (RF) model to model rainfall data. Since the RF model may be unable to classify and predict complex patterns and data, our study develops the RF model to model outputs with high accuracy. Since the DRSN–TCN model uses advanced operators to extract temporal features and remove irrelevant features, it can improve the performance of the RF model for predicting rainfall. We use a new optimizer named the Gaussian mutation (GM)–orca predation algorithm (OPA) to set the DRSN–TCN–RF (DTR) parameters and determine the best input scenario. This paper introduces a new machine learning model for rainfall prediction, improves the accuracy of the original TCN, and develops a new optimization method for input selection. The models used the lagged rainfall data to predict monthly data. GM–OPA improved the accuracy of the orca predation algorithm (OPA) for feature selection. The GM–OPA reduced the root mean square error (RMSE) values of OPA and particle swarm optimization (PSO) by 1.4%–3.4% and 6.14–9.54%, respectively. The GM–OPA can simplify the modeling process because it can determine the most important input parameters. Moreover, the GM–OPA can automatically determine the optimal input scenario. The DTR reduced the testing mean absolute error values of the TCN–RAF, DRSN–TCN, TCN, and RAF models by 5.3%, 21%, 40%, and 46%, respectively. Our study indicates that the proposed model is a reliable model for rainfall prediction.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 9
  • 10.3390/f12010048
A Crown Contour Envelope Model of Chinese Fir Based on Random Forest and Mathematical Modeling
  • Dec 31, 2020
  • Forests
  • Yingze Tian + 5 more

The tree crown is an important part of a tree and is closely related to forest growth status, forest canopy density, and other forest growth indicators. Chinese fir (Cunninghamia lanceolata (Lamb.) Hook) is an important tree species in southern China. A three-dimensional (3D) visualization assistant decision-making system of plantations could be improved through the construction of crown contour envelope models (CCEMs), which could aid plantation production. The goal of this study was to establish CCEMs, based on random forest and mathematical modeling, and to compare them. First, the regression equation of a tree crown was calculated using the least squares method. Then, forest characteristic factors were screened using methods based on mutual information, recursive feature elimination, least absolute shrink and selection operator, and random forest, and the random forest model was established based on the different screening results. The accuracy of the random forest model was higher than that of the mathematical modeling. The best performing model based on mathematical modeling was the quartic polynomial with the largest crown radius as the variable (R-squared (R2) = 0.8614 and root mean square error (RMSE) = 0.2657). Among the random forest regression models, the regression model constructed using mutual information as the feature screening method was the most accurate (R2 = 0.886, RMSE = 0.2406), which was two percentage points higher than mathematical modeling. Compared with mathematical modeling, the random forest model can reflect the differences among trees and aid 3D visualization of a Chinese fir plantation.

  • Research Article
  • Cite Count Icon 65
  • 10.1016/j.compag.2022.107512
Prediction of soil salinity parameters using machine learning models in an arid region of northwest China
  • Nov 25, 2022
  • Computers and Electronics in Agriculture
  • Chao Xiao + 8 more

Prediction of soil salinity parameters using machine learning models in an arid region of northwest China

  • Research Article
  • Cite Count Icon 5
  • 10.3389/fpubh.2022.910479
Prediction of Lumbar Drainage-Related Meningitis Based on Supervised Machine Learning Algorithms
  • Jun 28, 2022
  • Frontiers in Public Health
  • Peng Wang + 6 more

BackgroundLumbar drainage is widely used in the clinic; however, forecasting lumbar drainage-related meningitis (LDRM) is limited. We aimed to establish prediction models using supervised machine learning (ML) algorithms.MethodsWe utilized a cohort of 273 eligible lumbar drainage cases. Data were preprocessed and split into training and testing sets. Optimal hyper-parameters were archived by 10-fold cross-validation and grid search. The support vector machine (SVM), random forest (RF), and artificial neural network (ANN) were adopted for model training. The area under the operating characteristic curve (AUROC) and precision-recall curve (AUPRC), true positive ratio (TPR), true negative ratio (TNR), specificity, sensitivity, accuracy, and kappa coefficient were used for model evaluation. All trained models were internally validated. The importance of features was also analyzed.ResultsIn the training set, all the models had AUROC exceeding 0.8. SVM and the RF models had an AUPRC of more than 0.6, but the ANN model had an unexpectedly low AUPRC (0.380). The RF and ANN models revealed similar TPR, whereas the ANN model had a higher TNR and demonstrated better specificity, sensitivity, accuracy, and kappa efficiency. In the testing set, most performance indicators of established models decreased. However, the RF and AVM models maintained adequate AUROC (0.828 vs. 0.719) and AUPRC (0.413 vs. 0.520), and the RF model also had better TPR, specificity, sensitivity, accuracy, and kappa efficiency. Site leakage showed the most considerable mean decrease in accuracy.ConclusionsThe RF and SVM models could predict LDRM, in which the RF model owned the best performance, and site leakage was the most meaningful predictor.

  • Preprint Article
  • 10.5194/egusphere-egu24-7941
Prediction modelling framework comparative analysis of dissolved oxygen concentration variations using support vector regression coupled with multiple feature engineering and optimization methods: A case study in China
  • Nov 27, 2024
  • Xizhi Nong + 3 more

Dissolved oxygen (DO) is an essential indicator for assessing water quality and managing aquatic environments, but it is still a challenging topic to accurately understand and predict the spatiotemporal variation of DO concentrations under the complex effects of different environmental factors. In this study, a practical prediction framework was proposed for DO concentrations based on the support vector regression (SVR) model coupling multiple intelligence techniques (i.e., four data denoising techniques, three feature selection rules, and four hyperparameter optimization methods). The holistic framework was tested using a data matrix (17532 observation data in total) of 12 indicators from three vital water quality monitoring stations of the longest inter-basin water diversion project in the world (i.e., the Middle-Route of the South-to-North Water Diversion Project of China), during the year 2017 to 2020 period. The results showed that the framework we advocated for could successfully and accurately predict DO concentration variations in different geographical locations. The model used the “wavelet analysis–LASSO regression–random search–SVR” combination of the Waihuanhe station has the best prediction performance, with the Root Mean Square Error (RMSE), Mean Square Error (MSE), Mean Absolute Error (MAE), and coefficient of determination (R2) values of 0.251, 0.063, 0.190, and 0.911, respectively. The combined methods using feature selection and hyperparameter optimization techniques can significantly promote the robustness and accuracy of the prediction model and can provide a new universal and practical way for investigating and understanding the environmental drivers of DO concentration variations. For the water quality management department, this proposed comprehensive framework can also identify and reveal the key parameters that should be concerned and monitored under different environmental factors change. More studies in terms of assessing potential integrated water quality risk using multi-indicators in mega water diversion projects and/or similar water bodies are required in the future.

  • Research Article
  • Cite Count Icon 80
  • 10.1016/j.ecolind.2022.109845
Prediction modelling framework comparative analysis of dissolved oxygen concentration variations using support vector regression coupled with multiple feature engineering and optimization methods: A case study in China
  • Jan 2, 2023
  • Ecological Indicators
  • Xizhi Nong + 5 more

Prediction modelling framework comparative analysis of dissolved oxygen concentration variations using support vector regression coupled with multiple feature engineering and optimization methods: A case study in China

  • Research Article
  • Cite Count Icon 59
  • 10.1016/j.cageo.2014.10.016
Multivariable integration method for estimating sea surface salinity in coastal waters from in situ data and remotely sensed data using random forest algorithm
  • Nov 10, 2014
  • Computers & Geosciences
  • Meiling Liu + 4 more

Multivariable integration method for estimating sea surface salinity in coastal waters from in situ data and remotely sensed data using random forest algorithm

  • Research Article
  • Cite Count Icon 7
  • 10.1142/s0219876221430064
A Novel Parameters’ Identification Procedure for Aortic Walls Based on Hybrid Artificial Intelligence Approaches
  • Mar 28, 2022
  • International Journal of Computational Methods
  • Li Yang + 4 more

It is of great significance to have research on the deformation characteristics and stress distribution of aortic wall. Reliable prediction of constitutive parameters requires an inverse process, which possesses challenges. This work proposes an inverse procedure to identify the constitutive parameters of aortic walls, which integrates nonlinear finite element method (FEM), random forest (RF) model and hybrid Random Search (RS) and Grid Search (GS) algorithm. FEM models are first established to simulate nonlinear deformation of aortic walls subjected to uniaxial tension tests. A dataset of nonlinear relationship between the engineering stress and main stretch of aortic walls is created using FEM models and the nonlinear relationship is learned through RF model. The hybrid RS&GS algorithms are used to adjust the major hyperparameters in RF. Then the optimized RF is utilized to predict constitutive parameters of aortic walls with the help of uniaxial tension tests. The prediction results show that the RF optimized by hybrid RS&GS (RF-RS&GS) approach is an effective and accurate approach to identify the constitutive parameters of aortic walls. The present RF-RS&GS model can be further extended for the predictions of constitutive parameters of other types of nonlinear soft materials. Additionally, the relative importance of constitutive parameters of aortic walls in Gasser–Ogden–Holzapfel (GOH) strain energy function is investigated. It is found that the parameters [Formula: see text] and [Formula: see text]in GOH are most intensive to the engineering stress of aortic walls.

  • Research Article
  • Cite Count Icon 28
  • 10.15587/1729-4061.2021.242986
Development of prediction model of steel fiber-reinforced concrete compressive strength using random forest algorithm combined with hyperparameter tuning and k-fold cross-validation
  • Oct 29, 2021
  • Eastern-European Journal of Enterprise Technologies
  • Nadia Moneem Al-Abdaly + 3 more

Because of the incorporation of discontinuous fibers, steel fiber-reinforced concrete (SFRC) outperforms regular concrete. However, due to its complexity and limited available data, the development of SFRC strength prediction techniques is still in its infancy when compared to that of standard concrete. In this paper, the compressive strength of steel fiber-reinforced concrete was predicted from different variables using the Random forest model. Case studies of 133 samples were used for this aim. To design and validate the models, we generated training and testing datasets. The proposed models were developed using ten important material parameters for steel fiber-reinforced concrete characterization. To minimize training and testing split bias, the approach used in this study was validated using the 10-fold Cross-Validation procedure. To determine the optimal hyperparameters for the Random Forest algorithm, the Grid Search Cross-Validation approach was utilized. The root mean square error (RMSE), coefficient of determination (R2), and mean absolute error (MAE) between measured and estimated values were used to validate and compare the models. The prediction performance with RMSE=5.66, R2=0.88 and MAE=3.80 for the Random forest model. Compared with the traditional linear regression model, the outcomes showed that the Random forest model is able to produce enhanced predictive results of the compressive strength of steel fiber-reinforced concrete. The findings show that hyperparameter tuning with grid search and cross-validation is an efficient way to find the optimal parameters for the RF method. Also, RF produces good results and gives an alternate way for anticipating the compressive strength of SFRC

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 99
  • 10.3390/s18093086
Predicting Spatial Variations in Soil Nutrients with Hyperspectral Remote Sensing at Regional Scale.
  • Sep 13, 2018
  • Sensors
  • Ying-Qiang Song + 5 more

Rapid acquisition of the spatial distribution of soil nutrients holds great implications for farmland soil productivity safety, food security and agricultural management. To this end, we collected 1297 soil samples and measured the content of soil total nitrogen (TN), soil available phosphorus (AP) and soil available potassium (AK) in Zengcheng, north of the Pearl River Delta, China. Hyperspectral remote sensing images (115 bands) of the Chinese Environmental 1A satellite were used as auxiliary variables and dimensionality reduction was performed using Pearson correlation analysis and principal component analysis. The TN, AP and AK of soil were predicted in the study area based on auxiliary variables after dimensionality reduction, along with stepwise linear regression (SLR), support vector machine (SVM), random forest (RF) and back-propagation neural network (BPNN) models; 324 independent points were used to verify the predictive performance. The BPNN model, which demonstrated the best predictive accuracy among all methods, combined ordinary kriging (OK) with mapping the spatial variations of soil nutrients. Results show that the BPNN model with double hidden layers had better predictive accuracy for soil TN (root mean square error (RMSE) = 0.409 mg kg−1, R2 = 44.24%), soil AP (RMSE = 40.808 mg kg−1, R2 = 42.91%) and soil AK (RMSE = 67.464 mg kg−1, R2 = 48.53%) compared with the SLR, SVM and RF models. The back propagation neural network-ordinary kriging (BPNNOK) model showed the best predictive results of soil TN (RMSE = 0.292 mg kg−1, R2 = 68.51%), soil AP (RMSE = 29.62 mg kg−1, R2 = 69.30%) and soil AK (RMSE = 49.67 mg kg−1 and R2 = 70.55%), indicating the best fitting ability between hyperspectral remote sensing bands and soil nutrients. According to the spatial mapping results of the BPNNOK model, concentrations of soil TN (north-central), soil AP (central and southwest) and soil AK (central and southeast) were respectively higher in the study area. The most important bands (464–517 nm) for soil TN (b10, b14, b20 and b21), soil AP (b3, b19 and b22) and soil AK (b4, b11, b12 and b25) exhibited the best response and sensitivity according to the SLR, SVM, RF and BPNN models. It was concluded that the application of hyperspectral images (visible-near-infrared data) with BPNNOK model was found to be an efficient method for mapping and monitoring soil nutrients at the regional scale.

Save Icon
Up Arrow
Open/Close