Abstract
Person re-identification has become a challenging task due to various factors. One key to effective person re-identification is the extraction of the discriminative features of a person's appearance. Most previous works based on deep learning extract pedestrian characteristics from neural networks but only from the top feature layer. However, the low-layer feature could be more discriminative in certain circumstances. Hence, we propose a method, named the multi-level feature network with multiple losses (MFML), which has a multi-branch network architecture that consists of multiple middle layers and one top layer for feature representations. To extract the discriminative middle-layer features and have a good effect on deeper layers, we utilize the triplet loss function to train the middle-layer features. For the top layer, we focus on learning more discriminative feature representations, so we utilize the hybrid loss (HL) function to train the top-layer feature. Instead of concatenating multilayer features directly, we concatenate the weighted middle-layer features and the weighted top-layer feature as the discriminative features in the testing phase. The extensive evaluations conducted on three datasets show that our method achieves a competitive accuracy level compared with the state-of-the-art methods.
Highlights
Person re-identification (Re-ID) aims to establish correspondences among observations of the same person across non-overlapping cameras and short temporal periods
There are an average of 17.2 images per identity, with different appearances
We employ Cumulative Matching Characteristic (CMC) cure and mean Average Precision that are widely used in the person ReID literatures
Summary
Person re-identification (Re-ID) aims to establish correspondences among observations of the same person across non-overlapping cameras and short temporal periods. H. Wu et al.: Multi-Level Feature Network With Multi-Loss for Person Re-Identification. Most existing person Re-ID methods based on deep learning extract features from the top level of the trained network because these features are strongly discriminative. A deep neural network consists of multiple feature extraction layers, and the visual semantics of feature maps become more abstract when moving from the bottom to the top layers (see Fig. 1). We utilize a triplet loss function behind each middle feature extraction layer to improve the representation ability of the low-level features.
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have