Abstract

Despite the breakthroughs in accuracy and efficiency of object detection using deep neural networks, the performance of small object detection is far from satisfactory. Gaze estimation has developed significantly due to the development of visual sensors. Combining object detection with gaze estimation can significantly improve the performance of small object detection. This paper presents a centered multi-task generative adversarial network (CMTGAN), which combines small object detection and gaze estimation. To achieve this, we propose a generative adversarial network (GAN) capable of image super-resolution and two-stage small object detection. We exploit a generator in CMTGAN for image super-resolution and a discriminator for object detection. We introduce an artificial texture loss into the generator to retain the original feature of small objects. We also use a centered mask in the generator to make the network focus on the central part of images where small objects are more likely to appear in our method. We propose a discriminator with detection loss for two-stage small object detection, which can be adapted to other GANs for object detection. Compared with existing interpolation methods, the super-resolution images generated by CMTGAN are more explicit and contain more information. Experiments show that our method exhibits a better detection performance than mainstream methods.

Highlights

  • With visual sensors and computer vision development, gaze estimation technology can obtain gaze points with high accuracy [1]

  • We proposed a centered multi-task generative adversarial network (CMTGAN) to improve detection performance on small objects, which exploits points of interest presented by gaze estimation methods or detectors (e.g., YOLOv4, etc.) for small object detection

  • CNN-based super-resolution methods (e.g., ESRGAN, SPSR, etc.) may benefit small object detection, but they take a long time to super-resolve LR images, which makes the detection is not in real timeCMTGAN exhibited a better object detection performance than Faster RCNN combined with traditional interpolation with a similar inference time

Read more

Summary

Introduction

With visual sensors and computer vision development, gaze estimation technology can obtain gaze points with high accuracy [1]. Object detection algorithms such as YOLOv4 [5] and Faster RCNN [6] have low confidence and apparent location deviation in the prediction of small objects. The method of combining object detection and gaze estimation can significantly improve small object detection performance. Object detection algorithms have achieved impressive accuracy and efficiency in detecting large objects. There is still a big gap between the performances with small and large objects in recall and accuracy. To achieve a better detection performance when using small objects, SSD [7] uses feature maps from shallow layers for small objects. SODMTGAN [10] takes ROIs as input and predicts the categories and locations of objects

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call