Abstract

Human head statistics is widely used in the construction of smart cities and has great market value. In order to solve the problem of missing pedestrian features and poor statistics results in a large-angle overlooking scene, in this paper we propose a human head statistics system that consists of head detection, head tracking and head counting, where the proposed You-Only-Look-Once-Head (YOLOv5-H) network, improved from YOLOv5, is taken as the head detection benchmark, the DeepSORT algorithm with the Fusion-Hash algorithm for feature extraction (DeepSORT-FH) is proposed to track heads, and heads are counted by the proposed cross-boundary counting algorithm based on scene segmentation. Specifically, Complete-Intersection-over-Union (CIoU) is taken as the loss function of YOLOv5-H to make the predicted boxes more in line with the real boxes. The results demonstrate that the recall rate and mAP@.5 of the proposed YOLOv5-H can reach up to 94.3% and 93.1%, respectively, on the SCUT_HEAD dataset. The statistics system has an extremely low error rate of 3.5% on the TownCentreXVID dataset while maintaining a frame rate of 18FPS, which can meet the needs of human head statistics in monitoring scenarios and has a good application prospect.

Highlights

  • IntroductionPublisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations

  • Taking into account the problem of poor detection results due to the lack of pedestrian features in the large-angle overlooking scene, in this paper, we propose an efficient human head detection system that is composed of human head detection, multi-head tracking and head counting

  • We propose the YOLOv5-H network as the detection benchmark of the statistics system, where CIoU is taken as the loss function of the network to make the predicted boxes more in line with the real boxes; We propose a fusion hash algorithm and include it in the DeepSORT [15] algorithm for feature extraction to track human heads; The cross-boundary counting algorithm based on scene segmentation is proposed to count human heads; We evaluate the detection performance of the improved YOLOv5-H on the SCUT_HEAD

Read more

Summary

Introduction

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. The Region-CNN (R-CNN) proposed by Girshick [5] is a pioneering work of an object detection algorithm based on deep learning. The R-CNN is based on the two-stage idea, where the first stage is to extract image features using CNNs, and the second stage is to extract candidate regions using the selective search method The advantage of this algorithm is a high detection accuracy. The authors of [7] proposed a model that is based on decoding an image for the detection of groups of people They used a recurrent LSTM layer for sequence generation and trained the model end-to-end with a new loss function.

Related Work
YOLOv5
Backbone
DeepSORT
The Algorithm Architecture of Human Head Statistics System
Detection
Tracking
Statistics
Experimental Results and Discussion
Dataset
Detection Accuracy of YOLOv5-H
The Accuracy of Human Head Statistics System
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.