Object Detection Model Based on Deep Dilated Convolutional Networks by Fusing Transfer Learning

Yu Quan,Canlong Zhang,Huifang Ma,Zhixin Li

doi:10.1109/access.2019.2958817

Abstract

Object detection is an important research direction in the field of computer vision. In recent years, object detection has made great advances in public datasets, and the algorithm performance is also consistently similar to human capabilities. Therefore, all improvements in this paper are based on two-stage object detection. The algorithm model in the paper has three main innovations: first, considering that the resolution of the feature map and the size of the receptive field are not very good. Therefore, in this paper, a deep dilated convolution network is added to the backbone network to replace the conventional residual module, forming our own backbone network structure-Deep_Dilated Convolution Network (D_dNet). The formation of the D_dNet greatly reduces the number of parameters of the model during the training phase, and improves the quality of feature extraction. In addition, to improve accuracy, the network structure of the second part becomes “thick”, which increases the number of computations and reduces the detection speed. Therefore, by compressing the pretrained feature map and adding an 81-class fully connected layer to replace the original two layers, a light-weight network is formed. The light-weight network has changed the tradition of an overly thick fully connected layer, which not only makes the network lighter, but also significantly increases the corresponding detection speed. Finally, given the good hierarchy and transfer of the deep convolution neural network, this paper adds transfer learning to the training model to further optimize the model. In this paper, transfer learning is implemented to realize the transfer of weights. The training data samples are increased in disguise, there by increasing the stability of the training model. The evaluation of the entire model relies on the MSCOCO and PASCAL VOC datasets. In addition, the accuracy of the model is improved by 1.3%-2.2%.

Full Text