Abstract

The objective of person re-identification (Re-ID) is to match unmistakable people across different settings and camera views. Although the use of convolutional neural networks (CNN) has been effective, they lose much significant data and ignore the correlation between the overall and partial features brought about by downsampling operators (e.g., pooling). This compromises the retrieval performance based on the Euclidean distance. We propose a parallelly mixed attention network (PMA-Net) to demonstrate the complementarity of spatial and channel information for Re-ID. First, we integrate self-attention with depthwise convolution in a parallel design, extracting different dimensions of information. Second, we propose two interactive branches to entwine spatial and channel information while extracting features. Then, we add a new additional embedding to incorporate non-visual inputs and reduce the feature bias toward camera variances. Finally, we perform additional maximizing the minimal pairwise angles (MMA) regularization on the features of PMA, which successfully encourages the angular diversity of feature vectors. We confirm the effectiveness of PMA-Net through an extensive set of ablation studies. The results of three benchmarks show the superiority of our model over current mAP/top1 accuracies of 91.3%/96.1% on Market-1501, 83.3%/91.7% on DukeMTMC-ReID, and 66.2%/85.1% on MSMT17.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call