In the United States, 683 people were killed and an estimated 133,000 were injured in crashes due to running red lights in 2012. To help prevent/mitigate crashes caused by running red lights, these violations need to be identified before they occur, so both the road users (i.e., drivers, pedestrians, etc.) in potential danger and the infrastructure can be notified and actions can be taken accordingly. Two different data sets were used to assess the feasibility of developing red-light running (RLR) violation prediction models: (1) observational data and (2) driver simulator data. Both data sets included common factors, such as time to intersection (TTI), distance to intersection (DTI), and velocity at the onset of the yellow indication. However, the observational data set provided additional factors that the simulator data set did not, and vice versa. The observational data included vehicle information (e.g., speed, acceleration, etc.) for several different time frames. For each vehicle approaching an intersection in the observational data set, required data were extracted from several time frames as the vehicle drew closer to the intersection. However, since the observational data were inherently anonymous, driver factors such as age and gender were unavailable in the observational data set. Conversely, the simulator data set contained age and gender. In addition, the simulator data included a secondary (non-driving) task factor and a treatment factor (i.e., incoming/outgoing calls while driving). The simulator data only included vehicle information for certain time frames (e.g., yellow onset); the data did not provide vehicle information for several different time frames while vehicles were approaching an intersection. In this study, the random forest (RF) machine-learning technique was adopted to develop RLR violation prediction models. Factor importance was obtained for different models and different data sets to show how differently the factors influence the performance of each model. A sensitivity analysis showed that the factor importance to identify RLR violations changed when data from different time frames were used to develop the prediction models. TTI, DTI, the required deceleration parameter (RDP), and velocity at the onset of a yellow indication were among the most important factors identified by both models constructed using observational data and simulator data. Furthermore, in addition to the factors obtained from a point in time (i.e., yellow onset), valuable information suitable for RLR violation prediction was obtained from defined monitoring periods. It was found that period lengths of 2–6m contributed to the best model performance.
Read full abstract