Abstract
Small-object detection is a basic and challenging problem in computer vision tasks. It is widely used in pedestrian detection, traffic sign detection, and other fields. This paper proposes a deep learning small-object detection method based on image super-resolution to improve the speed and accuracy of small-object detection. First, we add a feature texture transfer (FTT) module at the input end to improve the image resolution at this end as well as to remove the noise in the image. Then, in the backbone network, using the Darknet53 framework, we use dense blocks to replace residual blocks to reduce the number of network structure parameters to avoid unnecessary calculations. Then, to make full use of the features of small targets in the image, the neck uses a combination of SPPnet and PANnet to complete this part of the multi-scale feature fusion work. Finally, the problem of image background and foreground imbalance is solved by adding the foreground and background balance loss function to the YOLOv4 loss function part. The results of the experiment conducted using our self-built dataset show that the proposed method has higher accuracy and speed compared with the currently available small-target detection methods.
Highlights
In recent years, considerable progress has been made in object detection, a significant performance gap remains when detecting small and large targets
We provide a more small-target feature information that, is used in the feature texture transfer (FTT) module to improve the resolution of small target features and remove noise in the image
We introduce the details of the experiment: the image SR evaluation result, the parameter setting of the backbone network, and the design of the loss function
Summary
Considerable progress has been made in object detection, a significant performance gap remains when detecting small and large targets. Early detection of masses and tumors is essential for an accurate early diagnosis. To address this problem, in this study, we aim to detect small targets in college classrooms. We propose that schools can estimate learning performance by using cameras to detect the head movements of students in a classroom. They can send the obtained positional information to the head pose estimation [3] model, estimate head postures using deep learning, and determine whether a student’s head is down or up to evaluate their listening state. The minimum resolution of the head size in an image is 15 × 15
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have