Cloud radon data for earthquake magnitude prediction using machine learning

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

<span>The study investigates the potential of integrating radon gas concentration telemonitoring systems with machine learning techniques to enhance earthquake magnitude prediction. Conducted in Pacitan, East Java, Indonesia, where the stations are near the active Grundulu fault, the research employs Random Forest (RF), Extreme Gradient Boosting (XGB), Neural Network (NN), AdaBoost (AB), and Support Vector Machine (SVM) methods. Utilizing real-time radon gas concentration measurements, the study aims to refine earthquake magnitude prediction, crucial for disaster preparedness. The evaluation involves multiple metrics like Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Root Mean Square Error (RMSE), Mean Squared Error (MSE), Symmetric Mean Absolute Percentage Error (SMAPE), and cnSMAPE. XGB and SVM emerge as top performers, showcasing superior predictive accuracy with minimal errors across various metrics. XGB achieved MAE (0.33), MAPE (6.03%), RMSE (0.51), MSE (0.26), SMAPE (0.06), and cnMAPE (0.97), while SVM recorded MAE (0.34), MAPE (6.20%), RMSE (0.51), MSE (0.26), SMAPE (0.06), and cnMAPE (0.97). The analysis reveals XGB as the most effective method, boasting the lowest error values. The study underscores the importance of expanding data availability to enhance predictive models, ultimately contributing to more precise earthquake magnitude predictions and effective mitigation strategies.</span>

Similar Papers
  • Research Article
  • Cite Count Icon 2
  • 10.11591/ijres.v13.i3.pp577-585
Earthquake magnitude prediction in Indonesia using a supervised method based on cloud radon data
  • Nov 1, 2024
  • International Journal of Reconfigurable and Embedded Systems (IJRES)
  • Thomas Oka Pratama + 3 more

In the challenging realm of earthquake prediction, the reliability of forecasting systems has remained a persistent obstacle. This study focuses on earthquake magnitude prediction in Indonesia, leveraging supervised machine learning techniques and cloud radon data. We present an analysis of the tele-monitoring system, data collection methods, and the application of regression-based machine learning algorithms. Utilizing a comprehensive dataset spanning 30 training instances and 105 test instances, the study evaluates multiple metrics to ascertain the efficacy of the prediction models. Our findings reveal that the linear regression approach yields the best earthquake magnitude prediction method, with the lowest values across multiple evaluation metrics: standard deviation 0.40, mean absolute error (MAE) 0.30, mean absolute percentage error (MAPE) 6%, root mean square error (RMSE) 0.52, mean squared error (MSE) 0.28, symmetric mean absolute percentage error (SMAPE) 0.06, and conformal normalized mean absolute percentage error (cnSMAPE) 0.97. Additionally, we discuss the implications of the research results and the potential applications in enhancing existing earthquake prediction methodologies.

  • Research Article
  • Cite Count Icon 2
  • 10.46481/jnsps.2024.2079
Wind speed prediction in some major cities in Africa using Linear Regression and Random Forest algorithms
  • Sep 8, 2024
  • Journal of the Nigerian Society of Physical Sciences
  • Timothy Kayode Samson + 1 more

Globally, wind energy if properly harnessed, could serve as a source of energy generation in Africa. This study compared the performance of two Machine Learning (ML) algorithms (Linear regression and Random Forest) in predicting wind speed in five major cities in Africa (Yaoundé, Pretoria, Nairobi, Cairo and Abuja). Wind data were collected between January 1, 2000, and December 31, 2022, using the Solar Radiation Data Archive. The data preprocessing was carried out with 80% of the data used for training and 20% for validation. The performance of these ML algorithms was evaluated using Mean Square Error (MSE), Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE) and coefficient of determination (R2). The result shows that Nairobi (3.814795 m/s) closely followed by Cairo (3.606453 m/s) has the highest mean wind speed while Yaoundé (1.090512 m/s) has the lowest. Based on the performance metrics used, the two Machine Learning algorithms were competitive. Still, the Linear Regression (LR) algorithm outperformed the Random Forest Algorithm in predicting wind speed in all the selected major African cities. In Yaoundé (RMSE = 0.3892, MAE= 0.3001, MAPE =0.5030), Pretoria (RMSE=1.2339, MAE=0.9480, MAPE=0.7450) Nairobi (RMSE= 0.4223, MAE =0.6499, MAPE =0.1872), Nairobi (RMSE=0.6499, MAE=0.5171, MAPE =0.1872), Cairo (RMSE =1.0909, MAE =0.8544, MAPE =0.3541) and Abuja (RMSE = 0.70245, MAE =0.5441, MAPE= 0.4515) the Linear regression algorithms was found to outperformed Random Forest Regression. Therefore, the Linear regression algorithm is more reliable in predicting wind speed compared with the Random Forest regression.

  • Research Article
  • 10.3389/fpubh.2026.1687658
Forecasting multidrug-resistant organisms infection trends in a Chinese tertiary hospital (2014–2024): a comparative study of SARIMA, ETS, Prophet, and NNETAR models
  • Jan 29, 2026
  • Frontiers in Public Health
  • Haiyan Chen + 1 more

BackgroundInfections caused by multidrug-resistant organisms (MDROs) continue to pose serious challenges for hospital infection control, often resulting in longer hospitalizations, increased patient morbidity, and higher healthcare costs. While time series forecasting has gained traction as a tool for anticipating MDROs trends, there remains a lack of real-world studies comparing the effectiveness of different modeling approaches using hospital-based data.ObjectiveThis study aimed to evaluate and compare the predictive performance of four time series models—SARIMA, ETS, Prophet, and NNETAR—using monthly MDROs infection data collected from a tertiary hospital in China between 2014 and 2023, with the goal of forecasting trends for 2024.MethodsMonthly MDROs infection rates from January 2014 to December 2023 were analyzed using R software. Stationarity was assessed through unit root tests, and appropriate differencing was applied as needed. Each model was fitted to the training dataset and used to forecast infection rates for the year 2024. Model accuracy was assessed by comparing forecasted values with actual 2024 data using root mean squared error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), symmetric mean absolute percentage error (sMAPE), and mean absolute scaled error (MASE).ResultsAmong the models, SARIMA produced the most consistent and reliable forecasts (RMSE = 0.0469, MAE = 0.0424, MAPE = 20.74%, sMAPE = 21.27%, MASE = 0.932), with residuals satisfying tests for independence and normality. Although the ETS model achieved lower numerical point errors (RMSE = 0.0367, MAE = 0.0305, MAPE = 14.46%, sMAPE = 14.81%, MASE = 0.670), its residual diagnostics raised concerns regarding robustness. The Prophet (RMSE = 0.0499, MAE = 0.0439, MAPE = 20.41%, sMAPE = 22.15%, MASE = 0.563) and NNETAR (RMSE = 0.0697, MAPE = 30.60%, sMAPE = 30.60%, MASE = 0.072) models captured certain aspects of the data dynamics but showed lower overall robustness compared with SARIMA.ConclusionBased on its overall robustness and diagnostic consistency, SARIMA is recommended for short- to medium-term forecasting of MDROs infection trends. The other models, while less reliable on their own, may still be valuable for validating trends and conducting sensitivity analyses to support hospital infection control planning.

  • Research Article
  • Cite Count Icon 15
  • 10.5430/ijba.v11n4p39
Evaluation of Several Error Measures Applied to the Sales Forecast System of Chemicals Supply Enterprises
  • Jun 30, 2020
  • International Journal of Business Administration
  • Ma Del Rocío Castillo Estrada + 5 more

The objective of the industry in general, and of the chemical industry in particular, is to satisfy consumer demand for products and the best way to satisfy it is to forecast future sales and plan its operations.Considering that the choice of the best sales forecast model will largely depend on the accuracy of the selected indicator (Tofallis, 2015), in this work, seven techniques are compared, in order to select the most appropriate, for quantifying the error presented by the sales forecast models. These error evaluation techniques are: Mean Percentage Error (MPE), Mean Square Error (MSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Mean Absolute Scaled Error (MASE), Symmetric Mean Absolute Percentage Error (SMAPE) and Mean Absolute Arctangent Percentage Error (MAAPE). Forecasts for chemical product sales, to which error evaluation techniques are applied, are those obtained and reported by Castillo, et. al. (2016 & 2020).The error measuring techniques whose calculation yields adequate and convenient results, for the six prediction techniques handled in this article, as long as its interpretation is intuitive, are SMAPE and MAAPE. In this case, the most adequate technique to measure the error presented by the sales prediction system turned out to be SMAPE.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 21
  • 10.1186/s12879-023-08184-1
Study of the influence of meteorological factors on HFMD and prediction based on the LSTM algorithm in Fuzhou, China
  • May 5, 2023
  • BMC Infectious Diseases
  • Hansong Zhu + 10 more

BackgroundThis study adopted complete meteorological indicators, including eight items, to explore their impact on hand, foot, and mouth disease (HFMD) in Fuzhou, and predict the incidence of HFMD through the long short-term memory (LSTM) neural network algorithm of artificial intelligence.MethodA distributed lag nonlinear model (DLNM) was used to analyse the influence of meteorological factors on HFMD in Fuzhou from 2010 to 2021. Then, the numbers of HFMD cases in 2019, 2020 and 2021 were predicted using the LSTM model through multifactor single-step and multistep rolling methods. The root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE) and symmetric mean absolute percentage error (SMAPE) were used to evaluate the accuracy of the model predictions.ResultsOverall, the effect of daily precipitation on HFMD was not significant. Low (4 hPa) and high (≥ 21 hPa) daily air pressure difference (PRSD) and low (< 7 °C) and high (> 12 °C) daily air temperature difference (TEMD) were risk factors for HFMD. The RMSE, MAE, MAPE and SMAPE of using the weekly multifactor data to predict the cases of HFMD on the following day, from 2019 to 2021, were lower than those of using the daily multifactor data to predict the cases of HFMD on the following day. In particular, the RMSE, MAE, MAPE and SMAPE of using weekly multifactor data to predict the following week's daily average cases of HFMD were much lower, and similar results were also found in urban and rural areas, which indicating that this approach was more accurate.ConclusionThis study’s LSTM models combined with meteorological factors (excluding PRE) can be used to accurately predict HFMD in Fuzhou, especially the method of predicting the daily average cases of HFMD in the following week using weekly multifactor data.

  • Conference Article
  • 10.1109/cecit53797.2021.00183
Research on performance prediction of gas turbine air filtration system based on transformation-gated LSTM method
  • Dec 1, 2021
  • Jiachi Yao + 5 more

The air filtration system is the only line of defense for outside air to enter the gas turbine. It is of great significance to study and predict the variation trend of air filtration system performance to ensure the safety, economy and reliability of gas turbine. In this work, the gas turbine air filtration system test is carried out, and the test data of fine filter differential pressure, ambient temperature and relative humidity are measured. Then the LSTM and transformation-gated LSTM (GT-LSTM) methods are utilized to predict the variation trend of air filtration system performance. Finally, the root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and symmetric mean absolute percentage error (SMAPE) are employed to evaluate the prediction effect quantitatively. The research results show that the RMSE, MAE, MAPE and SMAPE are only 0.0110, 0.0076, 3.3778 and 3.4655 by TG-LSTM method. The prediction error based on TG-LSTM method is smaller than LSTM method. Thus the prediction effect of TG-LSTM method is better than that of the LSTM method, and the TG-LSTM method is suitable for the performance prediction of gas turbine air filtration system.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 4
  • 10.1007/s11069-025-07134-1
HEM NAEMP: a novel hybrid ensemble model for North Anatolian Fault zone earthquake magnitude prediction
  • Jan 29, 2025
  • Natural Hazards
  • Elif Özceylan + 1 more

The application of machine learning in predicting earthquake magnitudes is crucial due to its ability to process extensive data sets and identify intricate patterns, thereby enhancing the accuracy and timeliness of predictions. This capability is essential for improving readiness and relief techniques against seismic activities. This study introduces a novel hybrid ensemble model, the HEM NAEMP, specifically evolved for predicting earthquake magnitudes along the North Anatolian Fault zone. The model integrates data from both the North Anatolian and San Andreas fault zones-the latter selected due to its tectonic similarity-to develop a comprehensive dataset that includes newly extracted features. The novelty of this study lies in the combination of data from two different fault lines to create a new dataset, the extraction of novel features, and the development of a previously unused model leveraging this dataset and its features. The HEM NAEMP model employs a several of regression algorithms, including k-nearest neighbors, random forest, support vector machine, decision tree and extreme gradient boosting, to effectively predict earthquake magnitude. The evaluation metrics for the model are as follows: mean squared error (MSE) of 0.011, mean absolute error (MAE) of 0.064, root mean squared error (RMSE) of 0.108, mean absolute percentage error (MAPE) of 0.268, R Square (R2) of 0.92 and training time of 2.44 sec. These results are compared against those from a Long-Short Term Memory (LSTM), Convolutional Neural Network (CNN) and AutoRegressive Integrated Moving Average (ARIMA) models, demonstrating that HEM NAEMP has mostly lower error rates in MAE and MAPE and high score in R2, as well as reduced training time, thereby confirming its viability and efficiency.

  • Research Article
  • 10.13057/ijap.v15i2.104276
Evaluating The Accuracy of Gridded Climate Datasets for Precipitation, Surface Air Temperature, and Sea Surface Temperature in Central Java, Indonesia
  • Nov 2, 2025
  • INDONESIAN JOURNAL OF APPLIED PHYSICS
  • Iis Widya Harmoko + 3 more

&lt;span lang="EN-AU"&gt;Studies of climate information that rely on accurate and reliable data are essential in hydrometeorological monitoring, early warning, and climate change impacts in areas with varied topography and limited observation data, such as Central Java, Indonesia. This study aims to assess the accuracy of gridded satellite and reanalysis on three main variables. Precipitation was analyzed utilizing CHIRPS, ERA5 Precipitation, and GSMaP products; surface air temperature (SAT) was assessed with ERA5-Land, FLDAS, and AIRS; and sea surface temperature (SST) was evaluated using OSTIA, RAMSSA, and GAMSSA. Observational data from six BMKG stations and iQuam functioned as the reference standard. The datasets were extracted using bilinear interpolation and evaluated using a bias, mean absolute error (MAE), mean absolute percentage error (MAPE), symmetric mean absolute percentage error (SMAPE) for precipitation, and root mean square error (RMSE). The evaluation showed that CHIRPS performed better estimation with the lowest RMSE and SMAPE (17.20 mm/day; 111.42 mm/month; 96.97% daily; 54.09% monthly) compared to ERA5-Precipitation and GSMaP. ERA5-Land in SAT showed better accuracy in MAE and MAPE of 1.2°C and &amp;lt;10% at most locations. For SST evaluation, OSTIA demonstrated the highest agreement with iQuam, showing RMSE of 0.246°C and MAPE of 0.552% in the Southern Sea, while GAMSSA recorded the highest errors across all zones. This study presents a variety of gridded dataset performances based on scale and time to illustrate the importance of validation against observational data. These results can guide researchers in processing the right dataset collection in climate applications in tropical ocean areas.&lt;/span&gt;

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 4
  • 10.5937/fme2401078n
Ekstrapolacija vertikalne brzine vetra korišćenjem statističkih pristupa
  • Jan 1, 2024
  • FME Transactions
  • Hilal Nuha + 4 more

The wind power industry has experienced a significant increase and popularity in recent times, and the latest statistics indicate that this sector is still thriving. However, one of the essential steps in developing wind energy projects is finding suitable sites for wind farms, which involves understanding the nature of wind speed, wind direction, terrain, and environmental impacts. To predict the wind energy production over the expected lifespan of a wind farm, vertical wind speed extrapolation to the hub height of the wind turbine is necessary. Therefore, this study presents a comprehensive evaluation of seven statistical approaches for vertical wind speed extrapolation, including Generalized Linear Models (GLM), Linear Regression (LR), Support Vector Machines (SVM), Generalized Additive Models (GAM), Gaussian Process Regression (GPR), Regression Tree (RT), and Ensemble Regression (ER). The accuracy of these methods is assessed using performance metrics such as Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Normalized RMSE (NRMSE), Normalized MSE (NMSE), Mean Bias Error (MBE), Mean Absolute Error (MAE), Mean Percentage Error (MPE), Mean Absolute Percentage Error (MAPE), Symmetric Mean Absolute Percentage Error (SMAPE), and R-squared (R2). The study concludes that, on average, GLM performs the best out of all seven statistical methods.

  • Research Article
  • 10.3390/buildings16050905
Rebar Price Prediction in Guangzhou, China: A Comparison of Statistical, Machine Learning and Hybrid Models
  • Feb 25, 2026
  • Buildings
  • Jiangnan Zhao + 4 more

Price volatility in steel reinforcement bars (rebar) plays a pivotal role in managing construction project costs, with precise forecasting being essential for maintaining corporate profitability and ensuring market stability. This research conducts a comprehensive evaluation of five prominent forecasting models—Autoregressive Integrated Moving Average (ARIMA), eXtreme Gradient Boosting (XGBoost), Prophet, Long Short-Term Memory (LSTM), and Transformer—specifically applied to steel rebar price prediction. The study emphasizes the influence of feature selection, defined as the number of historical price data points utilized for prediction, on the accuracy of these models. Furthermore, it develops a hybrid forecasting framework grounded in a residual complementarity mechanism aimed at improving long-term predictive performance. The results reveal that the ARIMA model delivers consistent and reliable short-term forecasts, particularly within a two-month horizon, whereas the Prophet model effectively captures long-term price trends but suffers from notable short-term bias. A two-stage hybrid model (referred to as Combination Model II), which integrates ARIMA and Prophet through residual inversion, demonstrates superior forecasting accuracy over a six-month period. This hybrid approach surpasses the standalone ARIMA model by more than 70% across key evaluation metrics—including Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Symmetric Mean Absolute Percentage Error (SMAPE), and Mean Absolute Scaled Error (MASE)—and exceeds the performance of the standalone Prophet model by over 90%. This integration effectively combines the high short-term precision of ARIMA with the long-term trend stability of Prophet. Within the domain of machine learning and deep learning models, XGBoost achieves optimal predictive accuracy when utilizing between one and four features. The predictive performance of LSTM does not exhibit a straightforward linear relationship with the number of features; however, certain feature combinations enable it to outperform other models. Transformer models maintain stable accuracy when employing feature sets ranging from one to five and twelve to seventeen, but display considerable variability in performance when the feature count lies between five and twelve. This investigation delineates the optimal parameter ranges and contextual applicability for each model. The proposed hybrid forecasting methodology, alongside a model transfer strategy encompassing data preprocessing adjustments, parameter optimization, and weight adaptation, offers practical applicability to other commodity markets such as cement and concrete. Consequently, this research provides a scientifically grounded framework to support procurement decision-making processes within construction enterprises.

  • Research Article
  • Cite Count Icon 2
  • 10.34172/jrhs141059
Prediction the groundwater level of Hamadan-Bahar Plain, west of Iran using support vector machines.
  • Oct 27, 2013
  • Journal of Research in Health Sciences
  • Lily Tapak + 2 more

Water is considered as the main source of life but water resources are limited and nonrenewable. Different factors have caused groundwater to decrease. Therefore, modeling and predicting groundwater level is of great importance. Monthly groundwater level data of about 20 years (October 1991 to February 2012) from the Hamadan-Bahar Plain, west of Iran were used based on peizometric height related to hydrologic years. The support vector machine (SVM), a new nonlinear regression technique, was used to predict groundwater level. The performance of the SVM model was assessed by using criteria of R(2), root mean square error (RMSE), means absolute error (MAE), means absolute percentage error (MAPE), correlation coefficient and efficiency coefficient (E) and was then compared with the classic time series model. The SVM model had greater R(2) (=0.933), E (=0.950) and Correlation (=0.965). Moreover, SVM had lower RMSE (=0.120), MAPE (=0.140) and MAE (=0.124). There was no significant difference between the estimated values using two models and the observed value. The SVM outperforms classic time series model in predicting groundwater level. Therefore using the SVM model is reasonable for modeling and predicting fluctuations of groundwater level in Hamadan-Bahar Plain.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 1
  • 10.1186/s40069-025-00856-3
Explainable Machine Learning Framework with Experimental Validation for Strength Prediction of Magnesium Phosphate Cement
  • Nov 25, 2025
  • International Journal of Concrete Structures and Materials
  • Anxiang Song + 4 more

Magnesium Phosphate Cement (MPC) is recognized as an effective rapid repair material, with compressive strength serving as a key mechanical property indicator for its mortar formulations. Nevertheless, due to MPC's complex composition and formulation, predicting its compressive strength remains a significant challenge. In this study, a comprehensive database was developed, incorporating four key input variables: the magnesium-to-phosphate (M/P) molar ratio, water-to-cement (W/C) mass ratio, sand-to-binder (S/B) weight ratio, and the borax-to-magnesia(B/M) weight ratio. This dataset was used to train and validate eight machine learning models, including the Lightweight Gradient Boosting (LGB) algorithm, Support Vector Machine (SVM), Decision Tree (DT), Extreme Gradient Boosting (XGB), Ridge Regression (RR), Random Forest (RF), Backpropagation Neural Network (BP), and Gradient Boosting (GB) models. The eight machine learning models were evaluated using performance metrics, including Mean Absolute Percentage Error (MAPE), Mean Absolute Error (MAE), Correlation Coefficient, and Root Mean Square Error (RMSE), to identify the optimal model, which was then optimized via the Gray Wolf Optimizer (GWO). The most accurate prediction of MPC compressive strength was attained using the XGB model, with the GWO-optimized XGB model showing enhancement in MAPE, MAE, R2, and RMSE by 21.8%, 60.6%, 43.9%, and 55.3% respectively, relative to the unoptimized XGB model. Employing Shapley Additive exPlanations (SHAP) values and Partial Dependence Plots (PDP), this study facilitates the identification of the most influential input variables and quantifies their effects on MPC compressive strength. The optimized model was validated against experimental data, demonstrating robust and conservative prediction behavior. While the model is trained solely to predict compressive strength, its interpretability enables rational insights into how formulation variables influence strength, thereby supporting informed mix design decisions. This framework offers a reliable and transparent computational tool for preemptive strength assessment of MPC and guides the optimization of mechanical performance in structurally demanding applications.

  • Components
  • 10.7717/peerj-cs.746/table-10
Table 10: Forecasting accuracy measures of all models for daily recover patients on testing data.
  • Dec 16, 2021

Background Forecasting the time of forthcoming pandemic reduces the impact of diseases by taking precautionary steps such as public health messaging and raising the consciousness of doctors. With the continuous and rapid increase in the cumulative incidence of COVID-19, statistical and outbreak prediction models including various machine learning (ML) models are being used by the research community to track and predict the trend of the epidemic, and also in developing appropriate strategies to combat and manage its spread. Methods In this paper, we present a comparative analysis of various ML approaches including Support Vector Machine, Random Forest, K-Nearest Neighbor and Artificial Neural Network in predicting the COVID-19 outbreak in the epidemiological domain. We first apply the autoregressive distributed lag (ARDL) method to identify and model the short and long-run relationships of the time-series COVID-19 datasets. That is, we determine the lags between a response variable and its respective explanatory time series variables as independent variables. Then, the resulting significant variables concerning their lags are used in the regression model selected by the ARDL for predicting and forecasting the trend of the epidemic. Results Statistical measures—Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE) and Symmetric Mean Absolute Percentage Error (SMAPE)—are used for model accuracy. The values of MAPE for the best-selected models for confirmed, recovered and deaths cases are 0.003, 0.006 and 0.115, respectively, which falls under the category of highly accurate forecasts. In addition, we computed 15 days ahead forecast for the daily deaths, recovered, and confirm patients and the cases fluctuated across time in all aspects. Besides, the results reveal the advantages of ML algorithms for supporting the decision-making of evolving short-term policies.

  • Components
  • 10.7717/peerj-cs.746/supp-2
Supplemental Information 2: Confirm VS Recover.
  • Dec 16, 2021

Background Forecasting the time of forthcoming pandemic reduces the impact of diseases by taking precautionary steps such as public health messaging and raising the consciousness of doctors. With the continuous and rapid increase in the cumulative incidence of COVID-19, statistical and outbreak prediction models including various machine learning (ML) models are being used by the research community to track and predict the trend of the epidemic, and also in developing appropriate strategies to combat and manage its spread. Methods In this paper, we present a comparative analysis of various ML approaches including Support Vector Machine, Random Forest, K-Nearest Neighbor and Artificial Neural Network in predicting the COVID-19 outbreak in the epidemiological domain. We first apply the autoregressive distributed lag (ARDL) method to identify and model the short and long-run relationships of the time-series COVID-19 datasets. That is, we determine the lags between a response variable and its respective explanatory time series variables as independent variables. Then, the resulting significant variables concerning their lags are used in the regression model selected by the ARDL for predicting and forecasting the trend of the epidemic. Results Statistical measures—Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE) and Symmetric Mean Absolute Percentage Error (SMAPE)—are used for model accuracy. The values of MAPE for the best-selected models for confirmed, recovered and deaths cases are 0.003, 0.006 and 0.115, respectively, which falls under the category of highly accurate forecasts. In addition, we computed 15 days ahead forecast for the daily deaths, recovered, and confirm patients and the cases fluctuated across time in all aspects. Besides, the results reveal the advantages of ML algorithms for supporting the decision-making of evolving short-term policies.

  • Components
  • 10.7717/peerj-cs.746/fig-5
Figure 5: Original and forecasted values of RF, SVM, KNN, and ANN models for daily death cases of COVID-19 of the testing set.
  • Dec 16, 2021

Background Forecasting the time of forthcoming pandemic reduces the impact of diseases by taking precautionary steps such as public health messaging and raising the consciousness of doctors. With the continuous and rapid increase in the cumulative incidence of COVID-19, statistical and outbreak prediction models including various machine learning (ML) models are being used by the research community to track and predict the trend of the epidemic, and also in developing appropriate strategies to combat and manage its spread. Methods In this paper, we present a comparative analysis of various ML approaches including Support Vector Machine, Random Forest, K-Nearest Neighbor and Artificial Neural Network in predicting the COVID-19 outbreak in the epidemiological domain. We first apply the autoregressive distributed lag (ARDL) method to identify and model the short and long-run relationships of the time-series COVID-19 datasets. That is, we determine the lags between a response variable and its respective explanatory time series variables as independent variables. Then, the resulting significant variables concerning their lags are used in the regression model selected by the ARDL for predicting and forecasting the trend of the epidemic. Results Statistical measures—Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE) and Symmetric Mean Absolute Percentage Error (SMAPE)—are used for model accuracy. The values of MAPE for the best-selected models for confirmed, recovered and deaths cases are 0.003, 0.006 and 0.115, respectively, which falls under the category of highly accurate forecasts. In addition, we computed 15 days ahead forecast for the daily deaths, recovered, and confirm patients and the cases fluctuated across time in all aspects. Besides, the results reveal the advantages of ML algorithms for supporting the decision-making of evolving short-term policies.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant