The human body is often occluded by a variety of obstacles in the monitoring system, so occluded person re-identification is still a long-standing challenge. Recent methods based on pose guidance or external semantic clues have improved the representation and related performance of features; there are still problems, such as weak model representation and unreliable semantic clues. To solve the above problems, we proposed a feature extraction network, named shared feature fusion with pose-guided and unsupervised semantic segmentation (SFPUS). This network will extract more discriminative features and reduce the occlusion noise on pedestrian matching. Firstly, the multibranch joint feature extraction module (MFE) is used to extract feature sets containing pose information and high-order semantic information. This module not only provides robust extraction capabilities but can also precisely segment occlusion and the body. Secondly, in order to obtain multiscale discriminant features, the multiscale correlation feature matching fusion module (MCF) is used to match the two feature sets, and the Pose–Semantic Fusion Loss is designed to calculate the similarity of the feature sets between different modes and fuse them into a feature set. Thirdly, to solve the problem of image occlusion, we use unsupervised cascade clustering to better prevent occlusion interference. Finally, performances of the proposed method and various existing methods are compared on the Occluded-Duke, Occluded-ReID, Market-1501 and Duke-MTMC datasets. The accuracy of Rank-1 reached 65.7%, 80.8%, 94.8% and 89.6%, respectively, and the mAP accuracy reached 58.8%, 72.5%, 91.8% and 80.1%. The experiment results demonstrate that our proposed SFPUS holds promising prospects and performs admirably compared with state-of-the-art methods.
Read full abstract