Abstract

Recently, it is becoming a challenging work for person re-identification due to the problems of occlusion, blurring and posture. The key of effective person re-identification is to capture sufficient detailed features of a person's appearance in images. Different from previous methods, our method mainly focuses on fusing different visual clues only depending on the features of different levels and scales without additional assistance. The major contributions of our paper are the mixed pooling strategy with different kernels and the mixed loss function. Firstly, we adopt ResNet50 as our backbone. We have slightly modified the backbone, which does not use the down-sampling operation at the beginning of stage 4. Inspired by pyramid pooling structure, we pass the outputs of Res4 and Res5 through the average pooling layer and max pooling layer with different kernels and strides separately. Secondly, we combine the averaged triplet losses and the averaged softmax losses as the final loss of the whole network. Extensive experiments on three datasets (CUHK3, Market1501, DukeMTMC-reID) show that compared with many state-of-the-art methods in recent years, our model achieve higher accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call