Abstract

Seamless, accurate, and reliable traffic information is critical for planning and operational strategies in traffic management. Urban networks are increasingly being equipped with various types of detectors to collect continuous traffic data. However, the widespread installation and management of these detectors at all intersections and road sections are limited by cost constraints and various problems. Moreover, the direct integration and utilization of heterogeneous data sources are challenging because of their distinct and interconnected characteristics. This study introduces a multimodal deep learning model, combining closed-circuit television (CCTV) and dedicated short-range communication (DSRC) data, to estimate lane-level urban traffic volumes. The proposed model employs a multilayer perceptron to extract features from each modality. These features are then fused and used as input into a recurrent neural network model that estimates the lane-level traffic volume. We present the multimodal deep learning model in three forms: (1) fusion of traffic volume, occupancy, and queue length from CCTV data; (2) fusion of traffic volume from CCTV data with travel time from DSRC data; and (3) fusion of different attributes from CCTV and heterogeneous DSRC data. In addition, we develop a single-modality model that solely utilizes CCTV data on traffic volume to compare the performance with the proposed multimodal model and identify scenarios where the multimodal approach is essential. The proposed model demonstrates significant improvements over the single-modality model, providing enhanced accuracy for higher temporal resolutions and lanes that permit left or right turns.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call