Prediksi Daya Listrik Pada Pembangkit Listrik Siklus Gabungan Berdasarkan Kondisi Lingkungan Menggunakan Metode Machine Learning
The utilization of machine learning methods in energy simulation enables the optimization of energy use and improves energy efficiency. In this research, the modeling of predicting power output was conducted under full load conditions in a Combined Cycle Power Plant (CCPP) based on the surrounding environmental conditions. Historical data of CCPP operation were used to model and predict power output under various environmental conditions. In this study, four machine learning algorithms, namely Linear Regression (LR), Decision Tree (DT), Random Forest (RF), and Artificial Neural Network (ANN), were compared and evaluated for their performance. The evaluation metrics used to measure the model performance were Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and R-Squared. The research results indicate that the Random Forest (RF) model achieved the best performance compared to other models with MAE of 2.314, RMSE of 3.372, and R-squared of 0.961. Additionally, the RF model also performed the best compared to other models in external testing with new data, where RF obtained values of MAE 2.579, RMSE 3.315, and R-squared 0.957. These results are consistent with the previous testing, indicating that RF has stable and reliable performance in predicting larger and more diverse datasets. This research contributes to understanding the potential application of machine learning in the power generation industry, especially in CCPP.
- Research Article
- 10.1007/s11250-025-04822-9
- Dec 29, 2025
- Tropical animal health and production
This study evaluated the comparative performance of nine machine learning (ML) algorithms for predicting 305-day first lactation milk yield (305FLMY) and total milk yield (TMY) in Murrah buffaloes. Data from 657 animals recorded over 24 years (2000-2023) were used, incorporating inputs such as animal details, year of calving, age at first calving (days), peak yield (kg), days to peak yield (DPY), and test day milk yields on the 6th (TD1), 35th (TD2), and 65th (TD3) days. The ML algorithms included Artificial Neural Networks (ANN), Bayesian Regression (BR), Gaussian Process (GP), Gradient Boosting Machines (GBM), Multivariate Adaptive Regression Splines (MARS), Multiple Linear Regression (MLR), Random Forest (RF), Sequential Minimal Optimization Regression (SMOreg), and Support Vector Machines (SVM). Model performance was assessed using R², root mean square error (RMSE), mean absolute error (MAE), Mean absolute percentage error (MAPE), and bias. Among these, the RF model outperformed others for predicting 305FLMY (R² = 78.43%, RMSE = 258.41, MAPE = 9.46%), while SVM provided the best predictive performance for TMY prediction (R² = 71.76%, RMSE = 349.32, MAPE = 276.13). Conversely, ANN demonstrated the weakest performance across both traits. These findings indicated the potential of RF and SVM models for accurate prediction of complex traits in buffalo breeding programs. Future research should aim to enhance model interpretability and computational efficiency for practical on-farm application.
- Research Article
35
- 10.3390/su15065341
- Mar 17, 2023
- Sustainability
Air pollution in Macau has become a serious problem following the Pearl River Delta’s (PRD) rapid industrialization that began in the 1990s. With this in mind, Macau needs an air quality forecast system that accurately predicts pollutant concentration during the occurrence of pollution episodes to warn the public ahead of time. Five different state-of-the-art machine learning (ML) algorithms were applied to create predictive models to forecast PM2.5, PM10, and CO concentrations for the next 24 and 48 h, which included artificial neural networks (ANN), random forest (RF), extreme gradient boosting (XGBoost), support vector machine (SVM), and multiple linear regression (MLR), to determine the best ML algorithms for the respective pollutants and time scale. The diurnal measurements of air quality data in Macau from 2016 to 2021 were obtained for this work. The 2020 and 2021 datasets were used for model testing, while the four-year data before 2020 and 2021 were used to build and train the ML models. Results show that the ANN, RF, XGBoost, SVM, and MLR models were able to provide good performance in building up a 24-h forecast with a higher coefficient of determination (R2) and lower root mean square error (RMSE), mean absolute error (MAE), and biases (BIAS). Meanwhile, all the ML models in the 48-h forecasting performance were satisfactory enough to be accepted as a two-day continuous forecast even if the R2 value was lower than the 24-h forecast. The 48-h forecasting model could be further improved by proper feature selection based on the 24-h dataset, using the Shapley Additive Explanations (SHAP) value test and the adjusted R2 value of the 48-h forecasting model. In conclusion, the above five ML algorithms were able to successfully forecast the 24 and 48 h of pollutant concentration in Macau, with the RF and SVM models performing the best in the prediction of PM2.5 and PM10, and CO in both 24 and 48-h forecasts.
- Research Article
5
- 10.3390/app131910666
- Sep 25, 2023
- Applied Sciences
Soil organic matter (SOM) is an essential component of soil fertility that plays a vital role in the preservation of healthy ecosystems. This study aimed to produce an SOM-level map of the Batifa region in northern Iraq. Random forest (RF) and extreme gradient boosting (XGBoost) models were used to predict the SOM spatial distribution. A total of 96 soil samples were collected from the surface layer (0–30 cm) of both cropland and soil areas in Batifa. In addition, remote sensing data were obtained from Landsat 8, including bands 1–7, 10, and 11. Supplementary variables such as the normalized difference vegetation index (NDVI), soil-adjusted vegetation index (SAVI), brightness index (BI), and digital elevation model (DEM) were employed as tools to predict SOM levels across the region. To evaluate the accuracy of the RF and XGBoost models in predicting SOM levels, statistical metrics, including mean absolute error (MAE), root mean square error (RMSE), and determination coefficient (R2), were used, with 80% of the data used for prediction and 20% for validation. The findings of this study revealed that the XGBoost model exhibited higher accuracy (MAE = 0.41, RMSE = 0.62, and R2 = 0.92) in predicting SOM than the RF model (MAE = 0.65, RMSE = 0.96, R2 = 0.79). Band 10, DEM, SAVI, and NDVI were identified as the most important predictors for both the models. The methodology employed in this study, which utilizes machine learning models, has the potential to map SOM in similar settings. Furthermore, the results offer significant insights for the stakeholders involved in soil management, thereby facilitating the enhancement of agricultural techniques.
- Research Article
2
- 10.46481/jnsps.2024.2079
- Sep 8, 2024
- Journal of the Nigerian Society of Physical Sciences
Globally, wind energy if properly harnessed, could serve as a source of energy generation in Africa. This study compared the performance of two Machine Learning (ML) algorithms (Linear regression and Random Forest) in predicting wind speed in five major cities in Africa (Yaoundé, Pretoria, Nairobi, Cairo and Abuja). Wind data were collected between January 1, 2000, and December 31, 2022, using the Solar Radiation Data Archive. The data preprocessing was carried out with 80% of the data used for training and 20% for validation. The performance of these ML algorithms was evaluated using Mean Square Error (MSE), Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE) and coefficient of determination (R2). The result shows that Nairobi (3.814795 m/s) closely followed by Cairo (3.606453 m/s) has the highest mean wind speed while Yaoundé (1.090512 m/s) has the lowest. Based on the performance metrics used, the two Machine Learning algorithms were competitive. Still, the Linear Regression (LR) algorithm outperformed the Random Forest Algorithm in predicting wind speed in all the selected major African cities. In Yaoundé (RMSE = 0.3892, MAE= 0.3001, MAPE =0.5030), Pretoria (RMSE=1.2339, MAE=0.9480, MAPE=0.7450) Nairobi (RMSE= 0.4223, MAE =0.6499, MAPE =0.1872), Nairobi (RMSE=0.6499, MAE=0.5171, MAPE =0.1872), Cairo (RMSE =1.0909, MAE =0.8544, MAPE =0.3541) and Abuja (RMSE = 0.70245, MAE =0.5441, MAPE= 0.4515) the Linear regression algorithms was found to outperformed Random Forest Regression. Therefore, the Linear regression algorithm is more reliable in predicting wind speed compared with the Random Forest regression.
- Research Article
1
- 10.2478/amns.2023.2.00225
- Aug 21, 2023
- Applied Mathematics and Nonlinear Sciences
Chinese culture is always the eternal root of the Chinese nation and is an important language tool used in international exchanges between China and the rest of the world. As China’s status in the international arena has been enhanced in recent years, people from all over the world have become increasingly interested in the Chinese language, and more and more people are going to China to study it. Because of this, in recent years, the academic field has been researching the exchange and cooperation model of Chinese language international education, but most focus on quality rather than reality. This paper takes Chinese international education in colleges and universities as the research object and composes and summarizes the results of international scholars’ exploration of the exchange and cooperation model of Chinese international education. Based on the relevant theoretical foundation of the random forest algorithm; then, based on the previous research experience, four hypotheses are put forward on the influencing factors of the Chinese international education exchange and cooperation model, and combined with statistical theory, the correlation coefficient and visualization are used to demonstrate the size and direction of the correlation between each influencing factor and the Chinese international education exchange and cooperation model; then, the random forest algorithm is carried out from two perspectives of feature selection and weighted random forest Then, the random forest algorithm is optimised from two perspectives: feature selection and weighted random forest, and the model of Chinese international education exchange and cooperation is constructed, with the decidability coefficient (R 2) as the evaluation index of model accuracy, and the mean absolute error (MAE) and root mean square error (RMSE) as the evaluation index of model error. The four hypothetical factors (AQI) indices were analyzed for 3, 6, and 9 years respectively. The results of this study show that the index analysis of the random forest algorithm-based model of Chinese language international education exchange and cooperation is highly evaluated. In the analysis of the AQI indices for 3, 6, and 9 years, the coefficient of decidability (R 2) of the four models reached 0.84, 0.74, and 0.63, respectively, with an increase of 14.23% in the evaluation of the indicator coefficients, an average reduction of 5.77% in the mean absolute error (MAE) and an average reduction of 5.42% in the root mean square error (MAE). According to the above results, the random forest algorithm-based Chinese language international education exchange and cooperation model constructed in this paper possesses good correlation analysis, which promotes Chinese language international education exchange and cooperation and facilitates humanistic exchanges between China and other countries in the world to become more in-depth and the common development of the global economy.
- Research Article
52
- 10.1016/j.cageo.2014.10.016
- Nov 10, 2014
- Computers & Geosciences
Multivariable integration method for estimating sea surface salinity in coastal waters from in situ data and remotely sensed data using random forest algorithm
- Research Article
175
- 10.1016/j.compag.2018.10.014
- Oct 30, 2018
- Computers and Electronics in Agriculture
Artificial intelligence approach for the prediction of Robusta coffee yield using soil fertility properties
- Research Article
54
- 10.3390/su14031183
- Jan 21, 2022
- Sustainability
The prediction accuracies of machine learning (ML) models may not only be dependent on the input parameters and training dataset, but also on whether an ensemble or individual learning model is selected. The present study is based on the comparison of individual supervised ML models, such as gene expression programming (GEP) and artificial neural network (ANN), with that of an ensemble learning model, i.e., random forest (RF), for predicting river water salinity in terms of electrical conductivity (EC) and dissolved solids (TDS) in the Upper Indus River basin, Pakistan. The projected models were trained and tested by using a dataset of seven input parameters chosen on the basis of significant correlation. Optimization of the ensemble RF model was achieved by producing 20 sub-models in order to choose the accurate one. The goodness-of-fit of the models was assessed through well-known statistical indicators, such as the coefficient of determination (R2), mean absolute error (MAE), root mean squared error (RMSE), and Nash–Sutcliffe efficiency (NSE). The results demonstrated a strong association between inputs and modeling outputs, where R2 value was found to be 0.96, 0.98, and 0.92 for the GEP, RF, and ANN models, respectively. The comparative performance of the proposed methods showed the relative superiority of the RF compared to GEP and ANN. Among the 20 RF sub-models, the most accurate model yielded the R2 equal to 0.941 and 0.938, with 70 and 160 numbers of corresponding estimators. The lowest RMSE values of 1.37 and 3.1 were yielded by the ensemble RF model on training and testing data, respectively. The results of the sensitivity analysis demonstrated that HCO3− is the most effective variable followed by Cl− and SO42− for both the EC and TDS. The assessment of the models on external criteria ensured the generalized results of all the aforementioned techniques. Conclusively, the outcome of the present research indicated that the RF model with selected key parameters could be prioritized for water quality assessment and management.
- Research Article
23
- 10.1371/journal.pone.0183742
- Sep 6, 2017
- PLoS ONE
Knowledge about the spatial distribution of active-layer (AL) soil thickness is indispensable for ecological modeling, precision agriculture, and land resource management. However, it is difficult to obtain the details on AL soil thickness by using conventional soil survey method. In this research, the objective is to investigate the possibility and accuracy of mapping the spatial distribution of AL soil thickness through random forest (RF) model by using terrain variables at a small watershed scale. A total of 1113 soil samples collected from the slope fields were randomly divided into calibration (770 soil samples) and validation (343 soil samples) sets. Seven terrain variables including elevation, aspect, relative slope position, valley depth, flow path length, slope height, and topographic wetness index were derived from a digital elevation map (30 m). The RF model was compared with multiple linear regression (MLR), geographically weighted regression (GWR) and support vector machines (SVM) approaches based on the validation set. Model performance was evaluated by precision criteria of mean error (ME), mean absolute error (MAE), root mean square error (RMSE), and coefficient of determination (R2). Comparative results showed that RF outperformed MLR, GWR and SVM models. The RF gave better values of ME (0.39 cm), MAE (7.09 cm), and RMSE (10.85 cm) and higher R2 (62%). The sensitivity analysis demonstrated that the DEM had less uncertainty than the AL soil thickness. The outcome of the RF model indicated that elevation, flow path length and valley depth were the most important factors affecting the AL soil thickness variability across the watershed. These results demonstrated the RF model is a promising method for predicting spatial distribution of AL soil thickness using terrain parameters.
- Conference Article
4
- 10.2118/212044-ms
- Aug 1, 2022
Reservoir fluid PVT properties are measured in the laboratory for various use in reservoir engineering evaluation and estimation. Despite the indispensability of these PVT parameters, PVT lab data are seldomly available and if available may be unreliable. Instead, various empirical models have been developed and used in the industry. These empirical models are inherently inaccurate when used to predict PVT properties of fluid from different geological region with different depositional environment and fingerprint. Artificial Intelligence (AI) has evolved over the years and provided some algorithms with potentials to develop accurate predictive model for the prediction of bubblepoint pressure. This work tested some AI algorithms, compared performances and choose random forest regression algorithm in developing a robust predictive model for the estimation of bubblepoint pressure. Two thousand five hundred and twenty-two datasets obtained from oil reservoirs in different geographical locations were used for the feature scaling of input data, training and testing of the models. The independent variables, gas-oil ratio, temperature, oil density and gas density were confirmed to have key influence on the dependent variable Bubblepoint pressure The random forest model developed uses ensemble learning approach, combines predictions from multiple machine learning algorithms by averaging all predictions to make a more accurate prediction. The ‘forest’ generated by the random forest algorithm was trained through bootstrap aggregating. This is an ensemble meta-algorithm that improves the accuracy of machine learning algorithms. Percentage data split was 70% training and 30% testing. The reliability, accuracy and completeness of the predictive model capability were computed through performance indices such as the root mean square error (RMSE) and mean absolute error (MAE). The best network architecture was determined along with the corresponding test set RMSE, and Correlation coefficient. Statistical and graphical error analysis of the results showed that the random forest model performed better than existing models with 0.98 correlation coefficients for bubblepoint pressure. Better accuracy of reservoir properties prediction could be achieved using this random forest reservoir fluid properties prediction model.
- Research Article
15
- 10.3390/app13148286
- Jul 18, 2023
- Applied Sciences
Seismic response assessment requires reliable information about subsurface conditions, including soil shear wave velocity (Vs). To properly assess seismic response, engineers need accurate information about Vs, an essential parameter for evaluating the propagation of seismic waves. However, measuring Vs is generally challenging due to the complex and time-consuming nature of field and laboratory tests. This study aims to predict Vs using machine learning (ML) algorithms from cone penetration test (CPT) data. The study utilized four ML algorithms, namely Random Forests (RFs), Support Vector Machine (SVM), Decision Trees (DT), and eXtreme Gradient Boosting (XGBoost), to predict Vs. These ML models were trained on 70% of the datasets, while their efficiency and generalization ability were assessed on the remaining 30%. The hyperparameters for each ML model were fine-tuned through Bayesian optimization with k-fold cross-validation techniques. The performance of each ML model was evaluated using eight different metrics, including root mean squared error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), coefficient of determination (R2), performance index (PI), scatter index (SI), A10−I, and U95. The results demonstrated that the RF model consistently performed well across all metrics. It achieved high accuracy and the lowest level of errors, indicating superior accuracy and precision in predicting Vs. The SVM and XGBoost models also exhibited strong performance, with slightly higher error metrics compared with the RF model. However, the DT model performed poorly, with higher error rates and uncertainty in predicting Vs. Based on these results, we can conclude that the RF model is highly effective at accurately predicting Vs using CPT data with minimal input features.
- Research Article
3
- 10.1007/s10489-022-04327-0
- Dec 10, 2022
- Applied Intelligence
Noninvasive assessment of skin structure using hyperspectral images has been intensively studied in recent years. Due to the high computational cost of the classical methods, such as the inverse Monte Carlo (IMC), much research has been done with the aim of using machine learning (ML) methods to reduce the time required for estimating parameters. This study aims to evaluate the accuracy and the estimation speed of the ML methods for this purpose and compare them to the traditionally used inverse adding-doubling (IAD) algorithm. We trained three models – an artificial neural network (ANN), a 1D convolutional neural network (CNN), and a random forests (RF) model – to predict seven skin parameters. The models were trained on simulated data computed using the adding-doubling algorithm. To improve predictive performance, we introduced a stacked dynamic weighting (SDW) model combining the predictions of all three individually trained models. SDW model was trained by using only a handful of real-world spectra on top of the ANN, CNN and RF models that were trained using simulated data. Models were evaluated based on the estimated parameters’ mean absolute error (MAE), considering the surface inclination angle and comparing skin spectra with spectra fitted by the IAD algorithm. On simulated data, the lowest MAE was achieved by the RF model (0.0030), while the SDW model achieved the lowest MAE on in vivo measured spectra (0.0113). The shortest time to estimate parameters for a single spectrum was 93.05 μs. Results suggest that ML algorithms can produce accurate estimates of human skin optical parameters in near real-time.
- Research Article
3
- 10.1186/s40068-025-00402-w
- Jun 17, 2025
- Environmental Systems Research
Understanding the spatial variability of soil erodibility and its associated indices across different land uses is critical for sustainable land use planning and management. Traditional methods for measuring these variables are often time-consuming and costly. To address this, the study employed digital soil mapping (DSM) and machine learning (ML) models as efficient and cost-effective alternatives to predict soil erodibility and its indices, including clay ratio, critical level of organic matter, crust formation, dispersion ratio, and soil aggregate stability. 50 soil surface samples (0–20 cm depth) were collected from forest, agricultural, and pasture land uses. Soil physicochemical properties were determined through laboratory analyses. The study utilized Multiple Linear Regression (MLR) and machine learning models, including Random Forest (RF), Support Vector Machine (SVM), Artificial Neural Network (ANN), and an ensemble of the four single models. These models were trained using the repeated tenfold cross-validation method and evaluated based on root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (R2). The results demonstrated that the ANN model outperformed others in predicting soil erodibility (R2 = 0.98, MAE = 0.00341, RMSE = 0.0031. The SVM and RF models also performed well, with SVM achieving R2 = 0.93, MAE = 0.00541, RMSE = 0.0038, and RF achieving R2 = 0.87, MAE = 0.0037, RMSE = 0.00557 for soil erodibility prediction. The superior performance of ANN is attributed to its ability to model complex, non-linear interactions among variables influencing soil erodibility. Nonetheless, challenges such as data quality requirements and the risk of overfitting highlight the need for careful model calibration. The spatial prediction of soil erodibility across land uses revealed distinct patterns. Forest soils exhibited the lowest mean erodibility values (0.0313 t ha⁻1 h MJ⁻1 mm⁻1), reflecting their higher resistance to erosion due to better soil structure and organic matter content. In contrast, agricultural land uses recorded the highest mean erodibility values (0.0320 t ha⁻1 h MJ⁻1 mm⁻1), likely due to frequent tillage and reduced vegetation cover, which increase erosion susceptibility. Among soil types, Calcaric Cambisols were identified as the most erosion-prone, while Lithic Leptosols were the least susceptible, attributed to differences in soil texture, structure, and organic matter content. Finally, the basin was classified based on soil erodibility classes. The analysis showed that 81.18% of the basin (covering 546.6 km2) falls under the less erodible class, highlighting the basin’s overall resilience to erosion. In conclusion, the study demonstrates that machine learning-based models can accurately predict soil erodibility and its indices. The resulting maps provide a valuable baseline for land use planning, natural resource management, and decision-making processes.
- Research Article
8
- 10.1007/s11069-022-05584-5
- Sep 5, 2022
- Natural Hazards
Wadi El-Matulla, located in the eastern desert of Egypt, is the most important water basin. The Qift–Qusayr highway (west–east direction) and the Cairo–Aswan eastern desert highway (north–south direction) pass through the watershed. Many urban areas (villages and industrial areas) and agricultural lands are located at the outlet of these basins. In addition, the basin has promising potential for future economic and urban development as it is located within the Golden Triangle (governmental megaproject). The current study investigates flood hazard modeling and its impact on the area. To determine the optimal flood susceptibility mapping algorithm, performance comparisons of three techniques were conducted: logistic regression (LR), extreme gradient boosting (EGB), and random forest (RF). Remote sensing, topographic, geologic, and meteorological data were used with the help of field visits to provide the spatial and inventory database required by the models. The performance and reliability of the predictions of the proposed models were evaluated using five statistical indices: receiver operating characteristic–area under the curve, overall accuracy (OAC), kappa index, root mean square error (RMSE), and mean absolute error (MAE). The performance of the models showed that the values of ROC (93, 86 and 80%), OAC (88, 82 and 76%), kappa index (0.85, 0.75 and 0.51), RMSE (0.34, 0.42 and 0.49) and MAE (0.12, 0.18 and 0.24) for RF, EGB, and LR, respectively. Based on AUC values, RF and EGB models provide excellent and very good prediction for flood susceptibility. Our results show that RF is the optimal algorithm for flood susceptibility mapping, followed by EGB and LR. Consequently, the predictive power of RF model is quite good and the flood susceptibility map was classified into five classes, namely very low (51.7%), low (23.7%), moderate (16.2%), high (7.1%), and very high (1.3%). Ultimately, the RF model was verified using sentinel-1 imagery for real floods in 2016 and 2021, and it provides good agreement. The optimal model could be useful for decision makers and planners to protect existing facilities and plan future projects in non-flood-prone areas. Accordingly, the most suitable areas for future development need to be distributed mainly in the low and very low flood hazard areas.
- Research Article
6
- 10.1038/s41598-025-01265-y
- May 16, 2025
- Scientific Reports
Hospital outpatient volume is influenced by a variety of factors, including environmental conditions and healthcare resource availability. Accurate prediction of outpatient demand can significantly enhance operational efficiency and optimize the allocation of medical resources. This study aims to develop a predictive model for daily hospital outpatient volume using the XGBoost algorithm. Meanwhile, the forecasting performance was compared with that of the Seasonal AutoRegressive Integrated Moving Average with exogenous regressors (SARIMAX) and Random Forest (RF) models. The dataset comprises daily climate data (e.g., temperature, precipitation, PM2.5 levels), historical outpatient volume records, and the number of outpatient specialists available each day. The data range involved spans from January 1, 2014, to October 31, 2024. Data preprocessing involved addressing missing values and encoding categorical variables. Model performance was assessed using three metrics: Mean Absolute Error (MAE), Root Mean Squared Error (RMSE) , Mean Absolute Percentage Error (MAPE), and R-squared (R2) metrics. The XGBoost model exhibited superior predictive accuracy compared to both the SARIMAX and RF models, with the lowest MAE, RMSE, MAPE, and the highest R2, successfully capturing key relationships between climate factors, resource availability, and outpatient volume. The number of outpatient specialists, temporal variables (such as year, quarter, month, and weekday), meteorological conditions (average temperature), and air quality (PM2.5) had the most significant impact on the prediction model. This study underscores the potential of machine learning algorithms like XGBoost in effectively predicting hospital outpatient demand. The findings offer valuable insights for hospitals to make proactive adjustments to their resource allocation, thereby improving their service capacity.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.