Feature Fusion and Adversary Occlusion Networks for Object Detection

Guang Han,Ning Sun,Xiaofei Li,Jixin Liu,Wang Zhou

doi:10.1109/access.2019.2938535

Abstract

Current object detection techniques have difficulties in detecting small objects and a low level of accuracy in detecting occluded objects. To solve these problems, this paper proposes an object detection framework named FFAN which is based on Faster R-CNN that introduces a feature fusion network and an adversary occlusion network into the structure. The feature fusion network combines a feature map of low resolution and high semantic information with a feature map of high resolution and low semantic information using the deconvolution operation to increase the ability to extract low-level features in the network. FFAN then generates a single advanced feature map with high resolution and high semantic information which is used to predict the detection of small objects in the image more effectively. The adversary occlusion network creates occlusion on a deep feature map of the object, and generates an adversary training sample that is difficult for the detector to discriminate. At the same time, the detector classifies accurately the generated occluded adversary samples by self-learning. The two compete with and learn from each other to further improve the performance of the algorithm. We train FFAN on the PASCAL VOC 2007, PASCAL VOC 2012, MS COCO and KITTI datasets. A number of quantitative and qualitative experiments show that FFAN achieves a state-of-the-art detection accuracy.

Highlights

In recent years, with the rapid development of deep neural networks, object detection technology based on deep learning has made great progress
To solve the problem of the poor detection results for small objects, and to provide a way of generating samples with different occlusions instead of generating the pixels directly, this paper designs a new object detection framework named FFAN which is based on Faster R-CNN [4] that introduces a feature fusion network into the structure and includes an adversary occlusion network that creates occlusion on the deep feature map of an object after the multilayer feature is fused in order to improve the detection accuracy of a partially occluded object
As the FFAN detection network is based on Faster R-CNN, we briefly review of the Faster R-CNN network before describing the structure of the feature fusion network

Summary

INTRODUCTION

With the rapid development of deep neural networks, object detection technology based on deep learning has made great progress. A number of methods have been used to generate a variety of images [14]–[22], and a collection of partially occluded object instances can be generated by Generative Adversarial Networks (GAN), which can generate realistic images This is not a reliable solution because generating these images requires a large number of similar training samples. To solve the problem of the poor detection results for small objects, and to provide a way of generating samples with different occlusions instead of generating the pixels directly, this paper designs a new object detection framework named FFAN which is based on Faster R-CNN [4] that introduces a feature fusion network into the structure and includes an adversary occlusion network that creates occlusion on the deep feature map of an object after the multilayer feature is fused in order to improve the detection accuracy of a partially occluded object

RELATED WORK

ADVERSARY LEARNING

NETWORK ARCHITECTURE

EXPERIMENTAL RESULTS AND ANALYSIS

CONCLUSION