Abstract

Video-based person re-identification (Re-ID) aims to match identical person sequences captured across non-overlapping surveillance areas. It is an essential yet challenging task to effectively embed spatial and temporal information into the video feature representation. For one thing, we observe that different frames in the video can provide complementary information for each other. Also, local features which is lost due to target occlusion or visual ambiguity in one frame can be supplemented by the same pedestrian part in other frames. For another thing, graph neural network enables the contextual interactions between relevant regional features. Therefore, we propose a novel sparse graph wavelet convolution neural network (SGWCNN) for video-based person Re-ID. Distinct from previous graph-based Re-ID methods, we exploit the weighted sparse graph to model the semantic relation among the local patches of pedestrians in the video. Each local patch in one frame can extract supplementary information from highly related patches in other frames. Moreover, to effectively solve the problems of short time occlusion and pedestrian misalignment, the graph wavelet convolution neural network is adopted for feature propagation to refine regional features iteratively. Experiments and evaluation on three challenging benchmarks, that is, MARS, DukeMTMC-VideoReID, and iLIDS-VID, show that the proposed SGWCNN effectively improves the performance of video-based person re-identification.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call