Abstract

As one of the most important areas of public safety and security, intelligent video surveillance is an indispensable part of the urban Internet of Things infrastructure. Person re-identification (person re-ID), which aims to track and recognize a person in a multi-camera scene, is mostly viewed as an image retrieval problem, and this task has been greatly boosted by deep convolutional neural networks (CNNs) in recent years. In practice, person re-ID usually adopts automatic detectors to obtain cropped pedestrian images, and CNNs are inherently limited to model geometric transformations due to the fixed geometric structures in their building modules. We incorporate the deformable convolution module to the traditional baseline to enhance the transformation modeling capability without additional supervision. The new module can readily replace their plain counterparts in the existing CNNs and can be easily trained end-to-end by standard backpropagation. Experiments on two large-scale re-ID datasets confirm the performance of our approach. The experiments also show that learning dense spatial transformation in deep CNNs is effective for person re-ID task and has a bright future in the intelligent video surveillance area.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call