Abstract

The concept of searching and localizing vehicles from live traffic videos based on descriptive textual input has yet to be explored in the scholarly literature. Endowing Intelligent Transportation Systems (ITS) with such a capability could help solve crimes on roadways. One major impediment to the advancement of fine-grain vehicle recognition models is the lack of video testbench datasets with annotated ground truth data. Additionally, to the best of our knowledge, no metrics currently exist for evaluating the robustness and performance efficiency of a vehicle recognition model on live videos and even less so for vehicle search and localization models. In this paper, we address these challenges by proposing V-Localize, a novel artificial intelligence framework for vehicle search and continuous localization captured from live traffic videos based on input textual descriptions. An efficient hashgraph algorithm is introduced to compute valid target information from textual input. This work further introduces two novel datasets to advance AI research in these challenging areas. These datasets include (a) the most diverse and large-scale Vehicle Color Recognition (VCoR) dataset with 15 color classes—twice as many as the number of color classes in the largest existing such dataset—to facilitate finer-grain recognition with color information; and (b) a Vehicle Recognition in Video (VRiV) dataset, a first of its kind video testbench dataset for evaluating the performance of vehicle recognition models in live videos rather than still image data. The VRiV dataset will open new avenues for AI researchers to investigate innovative approaches that were previously intractable due to the lack of annotated traffic vehicle recognition video testbench dataset. Finally, to address the gap in the field, five novel metrics are introduced in this paper for adequately accessing the performance of vehicle recognition models in live videos. Ultimately, the proposed metrics could also prove intuitively effective at quantitative model evaluation in other video recognition applications. T One major advantage of the proposed vehicle search and continuous localization framework is that it could be integrated in ITS software solution to aid law enforcement, especially in critical cases such as of amber alerts or hit-and-run incidents.

Highlights

  • The performance of the VMMYC vehicle recognition model was compared with different backbone architectures, ResNet, RepVGG, MobileNetv3, and VGG16

  • These results suggest that the RepVGG backbone performed better than all other recognition backbones on all metrics

  • The plored by introducing a novel Hashgraph algorithm to efficiently process input documents and compute valid target descriptions, which are used to trigger a query of the continuous localization model

Read more

Summary

Introduction

It is tasks important to incorporate such challenges inbest theon for vehicle recognition [1,6,12,13,14] These existing methods work high qualityperformance, still input image data in which such as the manufacturer training process for improved since it is adistinctive gating features issue preventing vehicle logo, the bumper, the headlights, taillights, and chassis are visible. Recognition models’ performances decline if not provided with high quality input data, searching and localizing a vehicle based on its textural such thereby impairing their ability to perform real-time recognition in description live-traffic video feeds.

Comparison
Related Work
Materials and Methodology
Training Pipeline for the Finer-Grain Vehicle Recognition Model
Hashgraph for Converting Textural Input to Valid Search Target
20. Maybe the color was dark “White
Evaluation
10. Distribution
4.2.Evaluation
Experimental Results and Discussions
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call