Human detection in images using deep learning has been a popular research topic in recent years and has achieved remarkable performance. Training a human detection network is useful for first responders to search for trapped victims in debris after a disaster. In this paper, we focus on the detection of such victims using deep learning, and we find that state-of-the-art detection models pre-trained on the well-known COCO dataset fail to detect victims. This is because all the people in the training set are shown in photos of daily life or sports activities, while people in the debris after a disaster usually only have parts of their bodies exposed. In addition, because of the dust, the colors of their clothes or body parts are similar to those of the surrounding debris. Compared with collecting images of common objects, images of disaster victims are extremely difficult to obtain for training. Therefore, we propose a framework to generate harmonious composite images for training. We first paste body parts onto a debris background to generate composite victim images and then use a deep harmonization network to make the composite images look more harmonious. We select YOLOv5l as the most suitable model, and experiments show that using composite images for training improves the AP (average precision) by 19.4% (15.3%→34.7%). Furthermore, using the harmonious images is of great benefit to training a better victim detector, and the AP is further improved by 10.2% (34.7%→44.9%). This research is part of the EU project INGENIOUS. Our composite images and code are publicly available on our website.
Read full abstract