Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Evaluating the performance of GUIDE, MOB, and CART in traffic accident prediction: A case study from Ankara

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Traffic accidents on state roads represent a major concern, resulting in injuries, fatalities, and significant economic losses. Identifying the primary factors that contribute to these accidents is vital for developing effective prevention measures. This study examines the frequency of traffic accidents on state roads in Ankara by employing Poisson regression and regression tree models, focusing on the comparison of three regression tree algorithms. A detailed simulation study was carried out to assess the performance of the tree algorithms in terms of variable selection bias, Type I error rates, and statistical power. Among them, GUIDE demonstrated the most balanced performance, with unbiased variable selection and strong power, effectively controlling Type I errors. In contrast, the CART algorithm outperformed others in scenarios involving overdispersion. Both CART and GUIDE exhibited similar estimation errors in real-world applications, highlighting their robustness and reliability. Additionally, the study observed that the MOB algorithm tended to favor numerical variables over categorical ones when performing splits. Although it maintained adequate control over Type I errors, MOB showed relatively lower power, reflecting its limitations in data partitioning. In conclusion, the findings offer valuable insights into selecting appropriate regression tree algorithms for traffic accident analysis, providing guidance for enhancing road safety.

Similar Papers
  • Research Article
  • Cite Count Icon 83
  • 10.1016/j.ecoinf.2019.05.003
Classification and regression with random forests as a standard method for presence-only data SDMs: A future conservation example using China tree species
  • May 7, 2019
  • Ecological Informatics
  • Lei Zhang + 6 more

Classification and regression with random forests as a standard method for presence-only data SDMs: A future conservation example using China tree species

  • Research Article
  • Cite Count Icon 65
  • 10.1016/j.eswa.2010.04.068
A decision making system to automatic recognize of traffic accidents on the basis of a GIS platform
  • May 6, 2010
  • Expert Systems with Applications
  • S Savaş Durduran

A decision making system to automatic recognize of traffic accidents on the basis of a GIS platform

  • Conference Article
  • 10.2991/iccmcee-15.2015.15
Research on Traffic Accident Forecasting Based on Gray Model
  • Jan 1, 2015
  • Lingling Tian + 1 more

As the road traffic system is an uncertain system, the occurrence of traffic accidents is also an uncertain system with partial information known and the other unknown. Therefore, it's suitable to apply the gray model theory to predict the traffic accidents. This paper expounds the principles of gray model and gives an example to show the feasibility and practicability of the gray model applied in the forecasting of traffic accidents.

  • Research Article
  • Cite Count Icon 4
  • 10.47852/bonviewjdsis42022395
Monte Carlo Simulation-Based Regression Tree Algorithm for Predicting Energy Consumption from Scarce Dataset
  • Apr 15, 2024
  • Journal of Data Science and Intelligent Systems
  • Tony Darmanto + 2 more

Most data-driven techniques rely on the availability of data. Hence, when the data provided are not sufficient, the algorithm might not work as intended. Thus, it is important to be able to predict the dynamics of the data, even when the number of available data is low, or scarce. This study aimed to predict the power consumption of a building given a scarce dataset via a novel Monte Carlo simulation-based Regression Tree (MCRT) algorithm. The main idea is to train Monte Carlo simulation on each leaf generated by the regression tree algorithm. Thus, the prediction no longer depends on the average of the samples contained in the leaf, but now depends on the probability of the samples. The proposed algorithm was validated on 2 datasets obtained from Universitas Widya Dharma Pontianak (UWDP), Indonesia, and Trapeznikov Institute of Control Sciences (TICS), Russia. To show that the MCRT algorithm is better than the regression tree (RT) algorithm, a two-tail hypothesis was proposed. Based on the experiments which were run on Python software with 16 GB RAM, 7th Gen Core i7 machine on 50 datasets randomly generated from the UWDP electrical data, it can be concluded that the MCRT algorithm performs better than the previous RT algorithm used to model scarce datasets with P-value = 0.000319. Furthermore, the proposed algorithm improves the model predictive accuracy of the RT algorithm by up to 2%. Received: 30 December 2023 | Revised: 21 March 2024 | Accepted: 8 April 2024 Conflicts of Interest The authors declare that they have no conflicts of interest to this work. Data Availability Statement The data that support the findings of this study are openly available in Google Drive at https://docs.google.com/spreadsheets/d/1o8sawOaOcX1kEm-dIdkcCUZhKoBduTAz/edit?usp=drive_link&ouid=115962907255429746256&rtpof=true&sd=true. Author Contribution Statement Tony Darmanto: Conceptualization, Validation, Investigation, Resources, Data curation, Writing - original draft, Writing — review & editing, Visualization, Supervision, Project administration. Jimmy Tjen: Conceptualization, Methodology, Software, Formal analysis, Writing — original draft, Writing — review & editing, Visualization. Genrawan Hoendarto: Validation, Investigation, Resources, Data curation, Writing — original draft, Writing — review & editing.

  • Research Article
  • 10.22119/ijte.2018.49725
Assessing Behavioral Patterns of Motorcyclists Based on Traffic Control Device at City Intersections by Classification Tree Algorithm
  • Apr 1, 2018
  • International Journal of Transportation Engineering
  • Mohammad Mehdi Khabiri

According to the forensic statistics, in Iran, 26 percent of those killed in traffic accidents are motorcyclists in recent years. Thus, it is necessary to investigate the causes of motorcycle accidents because of the high number of motorcyclist casualties. Motorcyclists' dangerous behaviors are among the causes of events that are discussed in this study. Traffic signs have the important role of traffic controller, and road surface marking is a tool for traffic separation and has a significant effect on drivers' behaviors. The aim of this study is to investigate the effect of variables, including traffic conditions, motorcyclists' psychological conditions, and symptoms and function of traffic lights on the motorcyclists' dangerous behaviors. In this study, classification tree method is used to determine the effective factors in some motorcyclists' dangerous behaviors such as the amount of deviation from the center lane, lane changing, and running red lights. The classification tree is easy to understand and interpret because of the graphical display of results. The data classification tree is made based on the classification and regression tree algorithm (CRT) in this study. The data are collected from the 7 intersections in a city with the medium population by video-based observation method. Hand-held cameras randomly record the motorcyclists' motions and, then, these behaviors are investigated in the office by playing back the videos at slow motion. The obtained trees show that the variables of traffic volume have the greatest impact on the motorcyclists' diversion from the center lane and lane changing. Also, the clarity of the pavement marking is effective in reducing deviation from the middle lane by cyclists so that, in the streets with the line color contrast of more than 1.36, deviation from the center lane is reduced by 25 cm.

  • Research Article
  • Cite Count Icon 32
  • 10.1002/sim.7020
Identification of subgroups with differential treatment effects for longitudinal and multiresponse variables.
  • Jun 27, 2016
  • Statistics in Medicine
  • Wei-Yin Loh + 4 more

We describe and evaluate a regression tree algorithm for finding subgroups with differential treatments effects in randomized trials with multivariate outcomes. The data may contain missing values in the outcomes and covariates, and the treatment variable is not limited to two levels. Simulation results show that the regression tree models have unbiased variable selection and the estimates of subgroup treatment effects are approximately unbiased. A bootstrap calibration technique is proposed for constructing confidence intervals for the treatment effects. The method is illustrated with data from a longitudinal study comparing two diabetes drugs and a mammography screening trial comparing two treatments and a control. Copyright © 2016 John Wiley & Sons, Ltd.

  • Research Article
  • Cite Count Icon 14
  • 10.1088/1742-6596/2170/1/012003
Imbalanced Traffic Accident Text Classification Based on Bert-RCNN
  • Feb 1, 2022
  • Journal of Physics: Conference Series
  • Shijin Yuan + 1 more

Traffic is the signs of urban development, and traffic accidents have a particularly large impact on road conditions. Because traffic accidents are recorded in text, classifying the severity of traffic accidents can not only unify the classification standards, but also greatly help query and prediction of traffic accidents. The frequency of daily minor traffic accidents is much higher than that of serious traffic accidents, so this paper proposes an effective algorithm and model for categorizing imbalanced Chinese traffic accident texts based on severity. Firstly, we establish a standard for refined classification granularity combining expert knowledge, and propose unique accuracy standards to verify the impact of the processed dataset on the model. Then the noise on the dataset is removed, and finally input the text into Bidirectional Encoder Representations from Transformers (BERT) model uses the Recurrent Convolutional Neural Network (RCNN) model as a classifier for fine-tuning. By comparing with the results of other classification methods, it is verified that our algorithm and model have the best accuracy.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 12
  • 10.28991/cej-2024-010-06-013
Utilizing GIS and Machine Learning for Traffic Accident Prediction in Urban Environment
  • Jun 1, 2024
  • Civil Engineering Journal
  • Atif Ali Khan + 1 more

Traffic accident prediction is crucial to preventive measures against accidents and effective traffic management. Identifying hotspots can facilitate the selection of the most critical survey points to note the contributing features. In this research, an effort has been made to identify hotspots and predict traffic accident occurrences in an urban area. Accident data was obtained from the Rescue 1122 Emergency Services of Faisalabad, and hotspots were identified using Moran’s I in ArcGIS. Results showed that most hotspots were located around the General Transport Stand (GTS) area due to the maximum number of road users. The temporal investigations showed that the accident occurrence was significant from 1 to 2 p.m. The identified hotspots were further investigated by conducting a field survey. Essential features such as road geometric features, road furniture, and traffic data were used for developing Machine Learning Algorithms for accident prediction. Using Computer Vision, traffic data was extracted from recorded videos. Random forest, linear regression, and Decision tree algorithms were developed using Python in the Jupyter Notebook environment. The decision tree algorithm showed a maximum accuracy of 84.4%. The analysis of contributing factors revealed that road measurements had the maximum effect on accident occurrence. Doi: 10.28991/CEJ-2024-010-06-013 Full Text: PDF

  • Research Article
  • Cite Count Icon 1
  • 10.12652/ksce.2015.35.6.1297
Development of Traffic Accident frequency Prediction Model by Administrative zone - A Case of Seoul
  • Dec 31, 2015
  • JOURNAL OF THE KOREAN SOCIETY OF CIVIL ENGINEERS
  • Ji Yeon Hong + 2 more

In Korea, the local traffic safety master plan has been established and implemented according to the Traffic Safety Act. Each local government is required to establish a customized traffic safety policy and share roles for improvement of traffic safety and this means that local governments lead and promote effective local traffic safety policies fit for local circumstances in substance. For implementing efficient traffic safety policies, which accord with many-sided characteristics of local governments, the prediction of community-based traffic accidents, which considers local characteristics and the analysis of accident influence factors must be preceded, but there is a shortage of research on this. Most of existing studies on the community-based traffic accident prediction used social and economic variables related to accident exposure environments in countries or cities due to the limit of collected data. For this reason, there was a limit in applying the developed models to the actual reduction of traffic accidents. Thus, this study developed a local traffic accident prediction model, based on smaller regional units, administrative districts, which were not omitted in existing studies and suggested a method to reflect traffic safety facility and policy variables that traffic safety policy makers can control, in addition to social and economic variables related to accident exposure environments, in the model and apply them to the development of local traffic safety policies. The model development result showed that in terms of accident exposure environments, road extension, gross floor area of buildings, the ratio of bus lane installation and the number of crossroads and crosswalks had a positive relation with accidents and the ratio of crosswalk sign installation, the number of speed bumps and the results of clampdown by police force had a negative relation with accidents.

  • Research Article
  • Cite Count Icon 16
  • 10.1007/s00521-011-0559-9
Automatic determination of traffic accidents based on KMC-based attribute weighting
  • Feb 24, 2011
  • Neural Computing and Applications
  • Kemal Polat + 1 more

In this study, the traffic accidents recognizing risk factors related to the environmental (climatological) conditions that are associated with motor vehicles accidents on the Konya-Afyonkarahisar highway with the aid of Geographical Information Systems (GIS) have been determined using the combination of K-means clustering (KMC)-based attribute weighting (KMCAW) and classifier algorithms including artificial neural network (ANN) and adaptive network-based fuzzy inference system (ANFIS). The dynamic segmentation process in ArcGIS9.0 from the traffic accident reports recorded by District Traffic Agency has identified the locations of the motor vehicle accidents. The attributes obtained from this system are day, temperature, humidity, weather conditions, and month of occurred traffic accidents. The traffic accident dataset comprises five attributes (day, temperature, humidity, weather conditions, and month of occurred traffic accidents) and 358 observations including 179 without accident and 179 with accident. The proposed comprises two stages. In the first stage, the all attributes of dataset have been weighted using KMCAW method. The aims of this weighting method are both to increase the classification performance of used classifier algorithm and to transform from linearly non-separable traffic accidents dataset to a linearly separable dataset. In the second stage, after weighting process, ANN and ANFIS classifier algorithms have been separately used to determine the case of traffic accidents as with accident or without accident. In order to evaluate the performance of proposed method, the classification accuracy, sensitivity, specificity and area under the ROC (Receiver Operating Characteristic) curves (AUC) values have been used. While ANN and ANFIS classifiers obtained the overall prediction accuracies of 53.93 and 38.76%, respectively, the combination of KMCAW and ANN and the combination of KMCAW and ANFIS achieved the overall prediction accuracies of 74.15 and 55.06% on the prediction of traffic accidents. The experimental results have demonstrated that the proposed attribute weighting method called KMCAW is a robust and effective data pre-processing method in the prediction of traffic accidents on Konya-Afyonkarahisar highway in Turkey.

  • Research Article
  • Cite Count Icon 21
  • 10.1111/exsy.12035
Extracting grey relational systems from incomplete road traffic accidents data: the case of Gauteng Province in South Africa
  • Jun 10, 2013
  • Expert Systems
  • Bhekisipho Twala

MotivationRoad traffic accidents are among the top leading causes of deaths and injuries of various levels in South Africa. With the wealth and huge amount of data generated from road traffic accidents, the issue of traffic accident prediction has become a central challenge in the field of transportation data analysis. Such accident prediction is designed to detect patterns involved in dangerous crashes and thus help decision making and planning before casualty and loss occur. Recently, numerous researchers have presented a wide range of prediction techniques. Most of these methods are based on statistical studies but usually fail to explain the insights of prediction results. This has led to the development and application of supervised learning algorithms (classifiers) in an attempt to provide more accurate accident prediction in terms of injury severity (fatal/serious/slight/property damage with no injury). Even then, the task of learning an accurate classifier from instances raises a number of new issues some of which have not been properly addressed by transportation research. Thus, an effective prediction method is required for improving predictive accuracy.ResultsThe essence of the paper is the proposal that prediction of accidents given poor data quality (in terms of incomplete data) can be improved by using a classifier based on grey relational analysis, a similarity‐based method. We evaluate the grey relational classifier with other state‐of‐the‐art classifiers including artificial neural networks, classification and regression trees, k‐nearest neighbour, linear discriminant analysis, naïve Bayes classifier, algorithm quasi‐optimal and support vector machines. Real‐world road traffic accident dataset is utilized for this task. Experimental results are provided to illustrate the efficiency and the robustness of the grey relational classifier algorithm in terms of road traffic accident predictive accuracy.

  • Research Article
  • Cite Count Icon 29
  • 10.1016/j.cie.2022.108924
A data-driven rule-based system for China’s traffic accident prediction by considering the improvement of safety efficiency
  • Dec 23, 2022
  • Computers & Industrial Engineering
  • Fei-Fei Ye + 3 more

A data-driven rule-based system for China’s traffic accident prediction by considering the improvement of safety efficiency

  • Conference Article
  • Cite Count Icon 2
  • 10.1145/3669754.3669816
Daily Power Consumption Plan Derivation via the Monte Carlo-Based Regression Tree Algorithm
  • Apr 26, 2024
  • Genrawan Hoendarto + 2 more

Various methods and algorithms are used in predicting electricity consumption. Data-based methods can produce mathematical models with efficiency and good accuracy and do not require professional knowledge. In this research, electricity consumption prediction will be performed based on training a Monte Carlo (MC) simulation on each leaf generated by the regression tree (RT) algorithm. The prediction no longer relies on the average of the samples contained in the leaf, but now relies on the sample probabilities. Often the regression tree algorithm gives overfitting results, so training each leaf will eliminate this. The dataset from Trapeznikov Institute of Control Sciences (TICS), Russia will be used to train and test the proposed method were obtained from because they were adequately recorded. The proposed Monte Carlo Regression Tree (MCRT) algoritm is used to train monthly data and tested on different months' data. The results are used to make predictions of daily trend usage to determine if there is any irregularity in electricity consumption.

  • Research Article
  • 10.33003/fjs-2025-0903-3321
AN ENHANCED CLASSIFICATION AND REGRESSION TREE ALGORITHM USING GINI EXPONENTIAL
  • Mar 31, 2025
  • FUDMA JOURNAL OF SCIENCES
  • Safinatu Bello + 6 more

Decision tree algorithms, particularly Classification and Regression Trees (CART), are widely used in machine learning for their simplicity, interpretability, and ability to handle both categorical and numerical data. However, traditional decision trees often encounter limitations when dealing with complex, high-dimensional, or imbalanced datasets, as conventional impurity measures such as the Gini Index and Information Gain may fail to capture subtle variations in the data effectively. This study enhances the traditional Classification and Regression Trees (CART) model by introducing the Gini Exponential Criterion, which incorporates an exponential weighting factor into the split point calculation process. This novel approach amplifies the influence of highly discriminative features, resulting in more refined splits and improved decision boundaries. The enhanced CART model was evaluated on two benchmark datasets: the wine quality dataset and the hypothyroid dataset, with preprocessing steps like feature scaling and SMOTE for class imbalance, and hyperparameter tuning via Bayesian Optimization. On the wine quality dataset, the enhanced model improved accuracy from 57% (traditional CART) to 86%, while on the hypothyroid dataset, it achieved an impressive accuracy of 98%. These results highlight the model's ability to handle complex and imbalanced data effectively. Feature importance analysis and decision tree visualization further demonstrated the model's interpretability. The study concludes that the Gini Exponential Criterion significantly improves CART's performance, offering better generalization and clearer decision boundaries. This advancement is particularly valuable for applications requiring precise and interpretable predictions, such as healthcare diagnostics and quality assessment. Future work could explore integrating this criterion into ensemble methods and...

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 48
  • 10.3846/16484142.2014.915581
IDENTIFYING THE KEY RISK FACTORS OF TRAFFIC ACCIDENT INJURY SEVERITY ON SLOVENIAN ROADS USING A NON-PARAMETRIC CLASSIFICATION TREE
  • Jun 9, 2014
  • Transport
  • Vesna Rovšek + 2 more

From both a practical and economic point of view, road transport meets almost all the requirements of modern life, but it is also a source of numerous negative effects, including traffic accidents. In order to design a safe transport system and achieve the ‘zero vision’ goal – no serious injuries or fatalities in traffic accidents – there is a growing need for a systematic approach to this problem. Prior to the assessment of any accident prevention measure it is necessary to identify the most important factors and significant patterns which affect the severity of accidents and injuries. In this study, the crash data from Slovenia pertaining to the period 2005–2009 were analysed with a Classification and Regression Tree (CART) algorithm, one of the most widely applied data mining technique when analysing a large amount of data with several independent quantitative or qualitative variables. Before building a non-parametric classification tree, the data were split into three totally separate subsets, the training set, the testing set, and the evaluation set. Moreover, using the Variable Importance Measure (VIM) the factor of influence of nine independent variables on the target variables were calculated. The results confirm that traffic accidents and injuries on Slovenian roads are caused by a combination of factors, the most important of them being human error, or more precisely, speeding and driving in the wrong lane.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant