Abstract
People detection in images has many uses today, ranging from face detection algorithms used by social networks to help the users tag other people, to surveillance systems that can create a statistic of the population density in an area, or identify a suspect, or even in the automotive industry as part of the Pedestrian Crash Avoidance Mitigation (PCAM) system. This work focuses on creating a fast and reliable object detection algorithm that will be trained on scenes that depict people in an indoor environment, starting from an existing state-of-the-art approach. The proposed method improves upon the You Only Look Once version 4 (YOLOv4) network by adding a region of interest classification and regression branch such as Faster R-CNN’s head. The candidate bounding boxes proposed by YOLOv4 are ranked based on their confidence score, the best candidates being kept and sent as input to the Faster Region-Based Convolutional Neural Network (R-CNN) head. To keep only the best detections, non-maximum suppression is applied to all proposals. This decreases the number of false-positive candidate bounding boxes, the low-confidence detections of the regression and classification branch being eliminated by the detections of YOLOv4 and vice versa in the non-maximum suppression step. This method can be used as the object detection algorithm in an image-based people tracking system, namely Tracktor, having a higher inference speed than Faster R-CNN. Our proposed method manages to achieve an overall accuracy of 95% and an inference time of 22 ms.
Highlights
Object detection is a computer vision task whose objective is to find certain objects of interest in an image and assign a bounding box and a category to them
The candidate bounding boxes proposed by You Only Look Once version 4 (YOLOv4) are ranked based on their confidence score, the best candidates being kept and sent as input to the Faster Region-Based Convolutional Neural Network (R-CNN) head
This decreases the number of false-positive candidate bounding boxes, the low-confidence detections of the regression and classification branch being eliminated by the detections of YOLOv4 and vice versa in the nonmaximum suppression step
Summary
Object detection is a computer vision task whose objective is to find certain objects of interest in an image and assign a bounding box and a category to them. Object detection algorithms, such as the one proposed by Viola and Jones [1], were used to identify preset features (Haar attributes in this case) in an image, which helped with regression and classification. Starting with R-CNN [2], the deep neural network approach became popular among researchers, with most state-of-the-art object detection algorithms using this paradigm nowadays. Even if a deeper neural network performs better theoretically, in practice the maximum depth is limited due to vanishing gradients. This problem was addressed by He et al [3] with the introduction of residual layers, which allowed the networks to be much deeper by facilitating the propagation of the gradient throughout the network. The network the authors proposed, ResNet, is used today as the backbone of many state-of-the-art detectors
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.