Traffic congestion detection in surveillance video is crucial for road traffic condition monitoring and improving traffic operation efficiency. Currently, traffic congestion is often characterized through traffic density, which is obtained by detecting vehicles or using holistic mapping methods. However, these traditional methods are not effective in dealing with the vehicle scale variation in surveillance video. This prompts us to explore density-map-based traffic density detection methods. Considering the dynamic characteristics of traffic flow, relying solely on the spatial feature of traffic density is overly limiting. To address these limitations, we propose a multi-task framework that simultaneously estimates traffic density and dynamic traffic congestion. Specifically, we firstly propose a Selective Scale-Aware Network (SSANet) to generate a traffic density map. Secondly, we directly generate a static congestion level from a traffic density map through a linear layer, which can characterize the spatial occupancy of traffic congestion in each frame. In order to further describe dynamic congestion, we simultaneously consider the dynamic characteristics of traffic flow, using the overall traffic flow velocity integrated with static congestion estimation for a dynamic assessment of congestion. On the collected dataset, our method achieves state-of-the-art results on both congestion detection and density estimation task. SSANet also obtains 99.21% accuracy on the UCSD traffic flow classification dataset, which outperforms other state-of-the-art methods.
Read full abstract