Forecasting Fatal Traffic Accidents During Early Covid-19 And Increased Enforcement Activities: A Case Study from Hatay Province, Türkiye Using Classical Time Series and Ensemble Machine Learning Models

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

This study investigates the temporal dynamics and predictive modelling of fatal traffic accidents in Hatay province, Türkiye, using both classical time series approaches (ARIMA, SARIMA, Holt–Winters) and machine learning techniques (Random Forest, Gradient Boosting). Monthly accident data from 2017–2021 were analysed through seasonal decomposition, stationarity testing, and comparative model evaluation. Results revealed a distinct seasonal pattern, with accident counts peaking during summer months and declining in winter, and a long-term trend showing a notable reduction in fatalities after 2017. Among the tested models, the Enhanced Gradient Boosting approach demonstrated the highest predictive accuracy (R² = 0.97, RMSE = 1.59), outperforming both classical time series and other ensemble methods. Forecast results for 2021 indicated seasonal peaks in June and August, corresponding to increased traffic density during the holiday period. The COVID-19 pandemic was associated with a marked short-term reduction in fatalities, though the effect appeared to diminish post-lockdown. These findings highlight the value of integrating advanced ensemble learning methods into traffic safety forecasting and underscore the importance of seasonally targeted interventions.

Similar Papers
  • Research Article
  • 10.14445/22315373/ijmtt-v67i6p516
English
  • Jun 25, 2021
  • International Journal of Mathematics Trends and Technology
  • Subian Saidi + 3 more

One of the most research discussed topics is the prediction or forecasting of the COVID-19 data using the classical time series (such as exponential smoothing) and machine learning methods. In fact, the classical time series method often produces quite large error rates. In this study, the researchers try to use nonparametric modeling with the kernel method to get better results with the smallest error rate. Furthermore, the results of the kernel method are compared with the results of the classical time series method. As a comparison tool, the researchers use MAPE by paying attention to the smallest MAPE value. The data used in this study are the COVID-19 data in Indonesia in which its variable is the total of deaths per day. After comparing the classical time series method with the kernel method, the obtained better results are the results from the kernel method. In this study, the researchers use five kernel functions, namely the Gaussian, Epanechnikov, Triangular, Biweight, and Triweight. Then, these five kernel functions are compared to find the best function. After the comparison process is done, the triweight kernel function was determined as the best function with the smallest error rate with a MAPE value of 0,9%. S Keywords — Covid-19, Kernel Method, Mean Absolute Percentage Error, Time Series, Triweight Kernel Function.

  • Research Article
  • Cite Count Icon 1
  • 10.29244/ijsa.v5i2p284-303
Forecasting Currency in East Java: Classical Time Series vs. Machine Learning
  • Jun 30, 2021
  • Indonesian Journal of Statistics and Its Applications
  • J A Putri + 5 more

Most research about the inflow and outflow currency in Indonesia showed that these data contained both linear and nonlinear patterns with calendar variation effect. The goal of this research is to propose a hybrid model by combining ARIMAX and Deep Neural Network (DNN), known as hybrid ARIMAX-DNN, for improving the forecast accuracy in the currency prediction in East Java, Indonesia. ARIMAX is class of classical time series models that could accurately handle linear pattern and calendar variation effect. Whereas, DNN is known as a machine learning method that powerful to tackle a nonlinear pattern. Data about 32 denominations of inflow and outflow currency in East Java are used as case studies. The best model was selected based on the smallest value of RMSE and sMAPE at the testing dataset. The results showed that the hybrid ARIMAX-DNN model improved the forecast accuracy and outperformed the individual models, both ARIMAX and DNN, at 26 denominations of inflow and outflow currency. Hence, it can be concluded that hybrid classical time series and machine learning methods tend to yield more accurate forecasts than individual models, both classical time series and machine learning methods.

  • Research Article
  • Cite Count Icon 1
  • 10.35882/jeeemi.v6i3.452
A Comparative Study of Improved Ensemble Learning Algorithms for Patient Severity Condition Classification
  • Jul 25, 2024
  • Journal of Electronics, Electromedical Engineering, and Medical Informatics
  • Edi Ismanto + 3 more

The evolution of Electronic Health Records (EHR) has facilitated comprehensive patient record-keeping, enhancing healthcare delivery and decision-making processes. Despite these advancements, analyzing EHR data using ensemble machine learning methods poses unique challenges. These challenges include data dimensionality, imbalanced class distributions, and the need for effective hyperparameter tuning to optimize model performance. The study conducted a thorough comparative analysis of various ensemble machine learning (EML) models using Electronic Health Record (EHR) datasets. After addressing data imbalance and reducing dimensionality, the accuracy of the EML models showed significant improvement. Notably, the Gradient Boosting Machine (GBM) and CatBoost models exhibited superior performance with an accuracy of 73%, achieved through experiments involving dimensionality reduction and handling of imbalanced data. Furthermore, optimization techniques such as Grid Search and Random Search were employed to enhance the EML models. The results of model optimization revealed that the GBM + Random Search model performed the best, achieving an accuracy of 74%, followed by the XGBoost + Grid Search model with an accuracy of 73%. The GBM model also excelled in distinguishing between positive and negative classes, boasting the highest Area under Curve (AUC) value of 0.78, indicative of its superior classification capabilities compared to other models. This study emphasizes the significance of incorporating cutting-edge EML techniques into clinical workflows and emphasizes the revolutionary potential of GBM in classification modeling for patient severity conditions. Future research should focus on deep learning (DL) applications and the integration of these models.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 201
  • 10.1016/j.cemconcomp.2021.104295
Interpretable Ensemble-Machine-Learning models for predicting creep behavior of concrete
  • Oct 13, 2021
  • Cement and Concrete Composites
  • Minfei Liang + 5 more

This study aims to provide an efficient and accurate machine learning (ML) approach for predicting the creep behavior of concrete. Three ensemble machine learning (EML) models are selected in this study: Random Forest (RF), Extreme Gradient Boosting Machine (XGBoost) and Light Gradient Boosting Machine (LGBM). Firstly, the creep data in Northwestern University (NU) database is preprocessed by a prebuilt XGBoost model and then split into a training set and a testing set. Then, by Bayesian Optimization and 5-fold cross validation, the 3 EML models are tuned to achieve high accuracy (R2 = 0.953, 0.947 and 0.946 for LGBM, XGBoost and RF, respectively). In the testing set, the EML models show significantly higher accuracy than the equation proposed by the fib Model Code 2010 (R2 = 0.377). Finally, the SHapley Additive exPlanations (SHAP), based on the cooperative game theories, are calculated to interpretate the predictions of the EML model. Five most influential parameters for concrete creep compliance are identified by the SHAP values of EML models as follows: time since loading, compressive strength, age when loads are applied, relative humidity during the test and temperature during the test. The patterns captured by the three EML models are consistent with theoretical understanding of factors that influence concrete creep, which proves that the proposed EML models show reasonable predictions.

  • Research Article
  • 10.21037/tlcr-2025-237
An online explainable ensemble machine learning model for predicting epidermal growth factor receptor mutation status in lung adenocarcinoma
  • Jul 28, 2025
  • Translational Lung Cancer Research
  • Qilong Song + 8 more

BackgroundNon-invasive determination of epidermal growth factor receptor (EGFR) mutation status is essential for selecting lung adenocarcinoma patients suitable for EGFR-tyrosine kinase inhibitors (EGFR-TKIs). This study aimed to develop and validate an online ensemble machine learning (EML) model that combines multiple machine learning (ML) models to predict the EGFR mutation status in lung adenocarcinoma.MethodsA total of 823 lung adenocarcinoma patients with known EGFR mutation status from three medical centers were divided into a training cohort (n=556) and a validation cohort (n=267) (ChiCTR2400083082 in the WHO International Clinical Trials Registry). Five ML models incorporating clinical and radiological characteristics—random forest (RF), logistic regression (LR), support vector machine (SVM), light gradient boosting machine (LightGBM), and extreme gradient boosting (XGBoost)—along with a CT-based deep learning (DL) model were constructed to predict EGFR mutation status. Subsequently, an EML model was created by combining these models. Model performance was assessed using the area under the receiver operating characteristic curve (AUC), and the SHapley Additive exPlanation (SHAP) method was used to explain the EML model.ResultsIn the training cohort, the AUCs for the RF, LR, SVM, LightGBM, XGBoost, DL, and EML were 0.851, 0.790, 0.810, 0.835, 0.853, 0.884, and 0.928, respectively. In the validation cohort, the AUCs for the RF, LR, SVM, LightGBM, XGBoost, DL, and EML were 0.753, 0.744, 0.732, 0.749, 0.751, 0.754, and 0.813, respectively. The Delong test indicated that the AUC of the EML model showed outstanding performance compared to the single models in both the training and validation cohorts. Decision curve analysis indicated that the EML model provided a clinically useful net benefit, and calibration curves showed good agreement. SHAP analysis identified predictive characteristics ranked by their contribution to the EML model: DL score, long-axis diameter, smoking history, pleural retraction, texture, vascular convergence, sex, air bronchogram, and bubblelike lucency. These characteristics were further used to develop an online web tool.ConclusionsThe EML model could serve as a non-invasive and accurate method for predicting EGFR mutation status in lung adenocarcinoma.

  • Research Article
  • 10.1016/j.compbiomed.2025.110008
Machine learning prediction of overall survival in prostate adenocarcinoma using ensemble techniques.
  • May 1, 2025
  • Computers in biology and medicine
  • Declan Ikechukwu Emegano + 3 more

Machine learning prediction of overall survival in prostate adenocarcinoma using ensemble techniques.

  • Research Article
  • Cite Count Icon 23
  • 10.3325/cmj.2008.49.734
Factors Associated with Fatal Traffic Accidents in Tirana, Albania: Cross-sectional Study
  • Dec 1, 2008
  • Croatian Medical Journal
  • Gentiana Qirjako + 3 more

To assess the prevalence of fatal road traffic accidents in Tirana, Albania, and describe their determinants. This cross-sectional study included all road traffic accidents recorded by the Traffic Police Department of Tirana district for the period 2000-2005. A structured questionnaire included information about the type of traffic accident (fatal vs non-fatal event), year of event, age and sex of the responsible party, reason of accident, location and time of event, and the type of vehicle involved. Multivariable-adjusted binary logistic regression analysis was used to assess the predictors of fatal road traffic accidents. Overall, there were 1578 recorded road traffic accidents in Tirana district during 2000-2005. Of these, 272 (17%) were fatal. Multivariable-adjusted models showed that younger age (OR, 3.97; 95% CI, 2.28-6.91), high speed (OR, 2.54; 95% CI, 1.62-3.98), and especially alcohol consumption (OR, 6.15; 95% CI, 3.54-10.66) were strong and significant predictors of fatal accidents. Fatal accidents were more prevalent on intercity roads (OR, 4.25; 95% CI, 3.11-5.82) and involved especially vans and trucks (OR, 4.12; 95% CI, 2.34-7.24). Young age, high speed, and alcohol are predictors of fatal road traffic accidents in Tirana district. These findings can serve as a basis for health care professionals and policymakers to create preventive measures for traffic accidents.

  • Research Article
  • 10.33545/surgery.2021.v5.i3b.740
Determinants of fatal road traffic accidents in the democratic republic of Congo from 2011 to 2016
  • Jul 1, 2021
  • International Journal of Surgery Science
  • Joachim Moba Ndongila + 5 more

Background and purpose: In many low-income countries, the increase in the number of vehicles is likely to have an impact on road traffic fatalities. The purpose of this study was to identify the determinants of fatal road traffic accidents in the Democratic Republic of Congo.Methods: This was an analytical cross-sectional study on data from road traffic accidents in 6 cities of the DRC over a period from 2011 to 2016, using data from the police stations of these 6 cities. It took into account all accidents on the public road (AVP) that were the subject of a report by police officers. Fatal traffic accident was the dependent variable while socio-demographic characteristics, behavioral and environmental determinants were the independent variables.Results: In six years, 4,635 accidents have been notified which have caused 945 fatal accidents, an overall frequency of 20.4%. After adjustment in multivariate analysis, the dry season (aOR: 1.66 95% CI: 1.41-1.96), public transport (aOR: 7.11 95% CI: 5.58-9.05), wrong maneuver (aOR: 2.93 95% CI: 2.22-3.87), the wrong crossing (aOR: 3.91 95% CI: 2.59-5.92) and drunk driving were (ORa: 4.32 95% CI: 3.56-5.23) were the independent determinants of fatal accidents.Conclusion: The fatal accident was linked to human and environmental factors, hence the need for behavior change awareness campaigns

  • Research Article
  • Cite Count Icon 4
  • 10.3390/bdcc9050138
A Comparative Study of Ensemble Machine Learning and Explainable AI for Predicting Harmful Algal Blooms
  • May 20, 2025
  • Big Data and Cognitive Computing
  • Omer Mermer + 2 more

Harmful algal blooms (HABs), driven by environmental pollution, pose significant threats to water quality, public health, and aquatic ecosystems. This study enhances the prediction of HABs in Lake Erie, part of the Great Lakes system, by utilizing ensemble machine learning (ML) models coupled with explainable artificial intelligence (XAI) for interpretability. Using water quality data from 2013 to 2020, various physical, chemical, and biological parameters were analyzed to predict chlorophyll-a (Chl-a) concentrations, which are a commonly used indicator of phytoplankton biomass and a proxy for algal blooms. This study employed multiple ensemble ML models, including random forest (RF), deep forest (DF), gradient boosting (GB), and XGBoost, and compared their performance against individual models, such as support vector machine (SVM), decision tree (DT), and multi-layer perceptron (MLP). The findings revealed that the ensemble models, particularly XGBoost and deep forest (DF), achieved superior predictive accuracy, with R2 values of 0.8517 and 0.8544, respectively. The application of SHapley Additive exPlanations (SHAPs) provided insights into the relative importance of the input features, identifying the particulate organic nitrogen (PON), particulate organic carbon (POC), and total phosphorus (TP) as the critical factors influencing the Chl-a concentrations. This research demonstrates the effectiveness of ensemble ML models for achieving high predictive accuracy, while the integration of XAI enhances model interpretability. The results support the development of proactive water quality management strategies and highlight the potential of advanced ML techniques for environmental monitoring.

  • PDF Download Icon
  • Preprint Article
  • Cite Count Icon 1
  • 10.31223/x5370r
Predicting Harmful Algal Blooms Using Ensemble Machine Learning Models and Explainable AI Technique: A Comparative Study
  • Nov 1, 2024
  • Omer Mermer + 2 more

Harmful Algal Blooms (HABs), driven by environmental pollution, pose significant threats to water quality, public health, and aquatic ecosystems. This study aims to enhance the prediction of HABs in Lake Erie, part of the Great Lakes system, by utilizing ensemble machine learning (ML) models coupled with explainable artificial intelligence (XAI) for interpretability. Using water quality data from 2013 to 2020, various physical, chemical, and biological parameters were analyzed to predict chlorophyll-a (Chl-a) concentrations, a proxy for algal blooms. The study employed multiple ensemble ML models, including Random Forest (RF), Deep Forest (DF), Gradient Boosting (GB), and XGBoost, and compared their performance against individual models such as Support Vector Machine (SVM), Decision Tree (DT), and Multi-Layer Perceptron (MLP). The findings reveal that ensemble models, particularly XGBoost and Deep Forest (DF), achieve superior predictive accuracy with R² values of 0.8517 and 0.8544, respectively. The application of SHapley Additive exPlanations (SHAP) provided insights into the relative importance of input features, identifying Particulate Organic Nitrogen (PON), Particulate Organic Carbon (POC), and Total Phosphorus (TP) as critical factors influencing Chl-a concentrations. This research demonstrates the effectiveness of integrating ensemble ML models with XAI to improve HAB prediction accuracy and interpretability. The results support the development of proactive water quality management strategies and highlight the potential of advanced ML techniques in environmental monitoring.

  • Research Article
  • 10.1142/s2335680414500033
On the autocorrelation function and its applicability in energy modelling
  • Mar 1, 2014
  • International Journal of Energy and Statistics
  • Christina Beneki

The ACF plays an important role in time series analysis as it is used for identifying lags in all autoregressive models. Given that energy continues to be modelled by classical time series methods which rely on the ACF, this paper aims to evaluate whether such models are valid as recent evidence suggests that the sum of the ACF is always equal to -½, which in turn indicates that a set of ACF estimates are not IID, and leaves open for criticism the theory underlying classical time series methods. The applicability of this new theory in the energy sector is evaluated via an application into four real data sets which include the effects of structural breaks, seasonality and unit root problems. The evidence shows that there exists a fundamental flaw in classical time series models used for energy data modelling.

  • Research Article
  • Cite Count Icon 24
  • 10.1016/j.eswa.2023.119768
Ensemble machine learning-based models for estimating the transfer length of strands in PSC beams
  • Mar 1, 2023
  • Expert Systems with Applications
  • Viet-Linh Tran + 1 more

Ensemble machine learning-based models for estimating the transfer length of strands in PSC beams

  • Research Article
  • Cite Count Icon 5
  • 10.1016/j.matpr.2024.04.081
Comparative analysis of conventional and ensemble machine learning models for predicting split tensile strength in thermal stressed SCM-blended lightweight concrete
  • Apr 1, 2024
  • Materials Today: Proceedings
  • Saad Shamim Ansari + 4 more

Comparative analysis of conventional and ensemble machine learning models for predicting split tensile strength in thermal stressed SCM-blended lightweight concrete

  • Research Article
  • Cite Count Icon 3
  • 10.1186/s12911-025-02874-3
Explainable AI for enhanced accuracy in malaria diagnosis using ensemble machine learning models
  • Apr 11, 2025
  • BMC Medical Informatics and Decision Making
  • Olushina Olawale Awe + 4 more

BackgroundMalaria, an infectious disease caused by protozoan parasites belonging to the Plasmodium genus, remains a significant public health challenge, with African regions bearing the heaviest burden. Machine learning techniques have shown great promise in improving the diagnosis of infectious diseases, such as malaria.ObjectivesThis study aims to integrate ensemble machine learning models and Explainable Artificial Intelligence (XAI) frameworks to enhance the diagnosis accuracy of malaria.MethodsThe study utilized a dataset from the Federal Polytechnic Ilaro Medical Centre, Ilaro, Ogun State, Nigeria, which includes information from 337 patients aged between 3 and 77 years (180 females and 157 males) over a 4-week period. Ensemble methods, namely Random Forest, AdaBoost, Gradient Boost, XGBoost, and CatBoost, were employed after addressing class imbalance through oversampling techniques. Explainable AI techniques, such as LIME, Shapley Additive Explanations (SHAP) and Permutation Feature Importance, were utilized to enhance transparency and interpretability.ResultsAmong the ensemble models, Random Forest demonstrated the highest performance with an ROC AUC score of 0.869, followed closely by CatBoost at 0.787. XGBoost, Gradient Boost, and AdaBoost achieved ROC AUC scores of 0.770, 0.747, and 0.633, respectively. These methods evaluated the influence of different characteristics on the probability of malaria diagnosis, revealing critical features that contribute to prediction outcomes.ConclusionBy integrating ensemble machine learning models with explainable AI frameworks, the study promoted transparency in decision-making processes, thereby empowering healthcare providers with actionable insights for improved treatment strategies and enhanced patient outcomes, particularly in malaria management.

  • Research Article
  • 10.30645/j-sakti.v8i1.785
Comparative Analysis of Machine Learning Models for Enhanced Chemical Detection in Sensor Array Data
  • Mar 30, 2024
  • J-SAKTI (Jurnal Sains Komputer dan Informatika)
  • Gregorius Airlangga

The objective of this study was to compare the efficacy of various machine learning models for classifying chemical substances using sensor array data from a wind tunnel facility. Six widely recognized machine learning algorithms were assessed: Random Forest, Gradient Boosting, Logistic Regression, Support Vector Machine (SVM), Decision Tree, and K-Nearest Neighbors (KNN). The dataset, consisting of 288 sensor array features, was preprocessed and utilized to evaluate the models based on accuracy, precision, recall, and F1 score through a 5-fold cross-validation method. The results indicated that ensemble methods, particularly Random Forest and Gradient Boosting, outperformed other models, achieving an accuracy and F1 score of over 99%. KNN also demonstrated high efficacy with similar performance metrics. In contrast, Logistic Regression showed modest results in comparison. The study's outcomes suggest that ensemble machine learning models are highly suitable for chemical detection tasks, potentially contributing to advancements in environmental monitoring and public safety. The findings also highlight the importance of quality data preprocessing in achieving optimal model performance. Future research directions include exploring hybrid models, deep learning techniques, and assessing model robustness against environmental variabilities. This research underscores the transformative potential of machine learning in chemical detection and paves the way for developing more sophisticated and reliable detection systems.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.