Vehicle re-identification primarily faces two challenges: significant intra-class variations caused by different camera views and subtle inter-class differences between vehicles of the same model. In this paper, we propose a Multi-View Progressive Graph Interaction Embedding Network (MP-GIEN) for vehicle re-identification. First, we design a global semantic extractor. By employing Graph Interaction Units (GI Units) and semantic context, we combine semantic information to facilitate contextual reasoning of image regions, thereby extracting global semantic features of the target. The global loss is then calculated using ID loss and triplet loss. Second, we devise a local view-aware extractor. The U-Net network is utilized to parse the image into four views (front, rear, top, and side). Subsequently, features are aligned through mask average pooling, and a progressive training strategy is adopted to address the learning of fine-grained information. Lastly, we design a feature fusion enhancer. We revise the typical triplet loss to avoid mismatches in local features. By optimizing the local triplet loss and global loss, we learn the embedding of visual features, which not only reduces the intra-instance distance but also enlarges the inter-instance differences. MP-GIEN facilitates the capture of stable discriminative information on vehicles under different views. Experiments conducted on the VeRi-Wild dataset demonstrate that our model significantly outperforms state-of-the-art methods.
Read full abstract