Abstract
In the field of computer vision, object detection consists of automatically finding objects in images by giving their positions. The most common fields of application are safety systems (pedestrian detection, identification of behavior) and control systems. Another important application is head/person detection, which is the primary material for road safety, rescue, surveillance, etc. In this study, we developed a new approach based on two parallel Deeplapv3+ to improve the performance of the person detection system. For the implementation of our semantic segmentation model, a working methodology with two types of ground truths extracted from the bounding boxes given by the original ground truths was established. The approach has been implemented in our two private datasets as well as in a public dataset. To show the performance of the proposed system, a comparative analysis was carried out on two deep learning semantic segmentation state-of-art models: SegNet and U-Net. By achieving 99.14% of global accuracy, the result demonstrated that the developed strategy could be an efficient way to build a deep neural network model for semantic segmentation. This strategy can be used, not only for the detection of the human head but also be applied in several semantic segmentation applications.
Highlights
Every day, more data is provided by the sensors around us
Saqib et al [24] used the state-of-the-art object detectors based on deep convolutional neural networks (R-CNN, Faster R-CNN, YOLO v2, and SDD) to detect human heads in natural scenes
We found that the model proposed with the ResNet50 backbone architecture obtained the best result in terms of precision, Intersection over Union (IoU) and boundary F1 (BF) score
Summary
More data is provided by the sensors around us. These sensors are themselves more precise and more talkative. The field of computer vision involves processing the data provided by the many image sensors available to us This is done in order to enable a computer to perform specific tasks without the help of humans. This is due to the small size of the dataset and the preparation of the ground truth by a human who can make an error at any time To solve those difficulties, a new end-to-end strategy (which accepts raw images as input and directly generates a set of bounding boxes of objects as output) based on. The result demonstrated that the fusion of the two parallel networks that we made could be an efficient way to build a deep neural network model for semantic segmentation This can be applied, for the detection of the human head, and in several semantic segmentation applications.
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have