Abstract

In the monitoring scene, parameters of different cameras are vary greatly, which makes Person re-identification (Re-ID) tasks extremely susceptible to factors such as scale, blur, and occlusion. To alleviate the these problems, this paper proposes a Dense Feature Pyramid Network (DFPN), which can converge to a better performance without pretraining. To be more specific, DFPN is composed of three main parts. First, a new Residual Convolutional Block (RCB) is designed by referring to the construction method of ResBlock. Taking RCB as a basic unit and combining it with the convolution layer structure of VGGNet, we construct the backbone RVNet (Residual VGGNet) to realize the rapid convergence of the network and solve the disappearance of the gradient. Second, based on Feature Pyramid Network, we design the Dense Pyramid Fusion Module by integrating the connection mode of DenseNet, which aims at the improvement of the richness and scale diversity of feature maps by taking semantic information and detail information into account. Finally, to increase the receptive field of the feature map, we introduce an improved retinal receptive field structure Improved RFB (IRFB) on the basis of Receptive Field Block (RFB), which can effectively solve the problem of pedestrian occlusion. In experiments on the public datasets Market1501, DukeMTMC-reID and Occluded-Duke, the Rank-1 accuracy can reach 94.12%, 87.25% and 51.72% with pretraining, respectively. A series of ablation experiments and comparative experiments have proved the effectiveness of our modules and overall scheme.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call