Abstract

Intelligent video surveillance plays a pivotal role in enhancing the infrastructure of smart urban environments. The seamless integration of multi-angled cameras, functioning as perceptive sensors, significantly enhances pedestrian detection and augments security measures in smart cities. Nevertheless, current pedestrian-focused target detection encounters challenges such as slow detection speeds and increased costs. To address these challenges, we introduce the YOLOv5-MS model, an YOLOv5-based solution for target detection. Initially, we optimize the multi-threaded acquisition of video streams within YOLOv5 to ensure image stability and real-time performance. Subsequently, leveraging reparameterization, we replace the original BackBone convolution with RepvggBlock, streamlining the model by reducing convolutional layer channels, thereby enhancing the inference speed. Additionally, the incorporation of a bioinspired "squeeze and excitation" module in the convolutional neural network significantly enhances the detection accuracy. This module improves target focusing and diminishes the influence of irrelevant elements. Furthermore, the integration of the K-means algorithm and bioinspired Retinex image augmentation during training effectively enhances the model's detection efficacy. Finally, loss computation adopts the Focal-EIOU approach. The empirical findings from our internally developed smart city dataset unveil YOLOv5-MS's impressive 96.5% mAP value, indicating a significant 2.0% advancement over YOLOv5s. Moreover, the average inference speed demonstrates a notable 21.3% increase. These data decisively substantiate the model's superiority, showcasing its capacity to effectively perform pedestrian detection within an Intranet of over 50 video surveillance cameras, in harmony with our stringent requisites.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call