Abstract

The increasing prevalence of video data, particularly from traffic and surveillance cameras, is accompanied by a growing need for improved object detection, tracking, and classification techniques. In order to encourage development in this area, the AI City Challenge, sponsored by IEEE Smart World and NVIDIA, cultivated a competitive environment in which teams from all over the world sought to demonstrate the effectiveness of their models after training and testing on a common dataset of 114,766 unique traffic camera keyframes. Models were constructed for two distinct purposes; track 1 designs addressed object detection, localization and classification, while track 2 designs aimed to produce novel approaches towards traffic related application development. Careful tuning of the Darknet framework's YOLO (You Only Look Once) architecture allowed us to achieve 2nd place scores in track 1 of the competition. Our model was able to achieve inference beyond 50 frames per second (FPS) when performing on the NVIDIA DGX-1's Tesla P100 GPU and up to 37 FPS on a NVIDIA GTX 1070 GPU. However, the NVIDIA Jetson TX2 edge device had a lackluster 2 FPS inference speed. To produce truly competitive automated traffic control systems, either more preferment edge device hardware or revolutionary neural network architectures are required. While our track 2 model approach demonstrated that it is reasonable to obtain useful traffic related metrics without the use of the region proposal networks and classification methods utilized in other models typically associated with traffic control systems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call