Abstract: Computer vision is one of the prime domains that enable to derive meaningful and crisp information from digital media, such as images, videos, and other visual inputs. Background: Detection and correctly tracking the moving objects in a video streaming is still a challenging problem in India. Due to the high density of vehicles, it is difficult to identify the correct objects on the roads. Methods: In this work, we have used a YOLO.v5 (You Only Look Once) algorithm to identify the different objects on road, such as trucks, cars, trams, and vans. YOLO.v5 is the latest algorithm in the family of YOLO. To train the YOLO.v5, KITTY dataset was used having 11682 images having different objects in a traffic surveillance system. After training and validating the dataset, three different models have been constructed setting various parameters. To further validate the proposed approach, results have also been evaluated on the Indian traffic dataset DATS_2022. Results: All the models have been evaluated using three performance metrics, such as precision, recall, and mean average precision (MAP). The final model has attained the best performance on KITTY dataset as 93.5% precision, 90.7% recall, and 0.67 MAP for different objects. The results attained on the Indian traffic dataset DATS_2022 included 0.65 precision, 0.78 recall value, and 0.74 MAP for different objects. Conclusion: The results depict the proposed model to have improved results as compared to stateof-the-art approaches in terms of performance and also reduce the computation time and object loss.
Read full abstract