Abstract

To achieve the intelligent perception of traffic video and overcome the dispersion of computing modules and the separation processing of multiple tasks, efficient chained centre network (ECCNet) is proposed as a unified framework, accomplishing detection, vehicle classification, tracking, and vehicle speed estimation simultaneously. First, for the speed-accuracy trade-off, CA-CenterNet is presented, which can detect vehicles and classify vehicle types to serve cross-frame tasks more accurately by embedding coordinate attention. Second, 3D convolution is employed to construct a self-adaptive branch for data association and speed estimation, respectively. This self-adaptive approach leverages the power of deep learning to enhance tracking performance and avoid camera calibration via capturing motion information across frames. Moreover, a chained structure is adopted to reuse the backbone feature map. The spatio-temporal information of adjacent frames can be extracted at almost no additional cost. Finally, the above single-frame and cross-frame tasks are integrated into a unified multi-task collaborative optimization model. The effectiveness of ECCNet is verified with experiments on the UA-DETRAC dataset. ECCNet achieves 55.5% MOTA, 0.76 F1 value, and 3.10 MAE with an inference speed of 32.5 Hz in the tracking, classification, and speed estimation tasks, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call