Abstract

We propose a parallel network with spatial–temporal attention for video-based person re-identification. Many previous video-based person re-identification methods use two-dimensional convolutional neural networks to extract spatial features, then, temporal features are extracted by temporal pooling or recurrent neural networks. Unfortunately, these series networks will cause the loss of spatial information when extracting temporal information. Different from previous methods, our parallel network can extract temporal and spatial features simultaneously, which can effectively reduce the loss of space information. In addition, we design a global temporal attention module, which obtains the attention weight through the correlation between the current frame and all the frames in the sequence. At the same time, the temporal module can act on the information extraction of spatial module. In this way, we can increase the temporal and spatial constraints. Experiments show that our method can effectively improve the re-id accuracy, better than the state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call