Abstract
Existing person re-identification (re-id) deep learning methods rely heavily on the utilisation of large and computationally expensive convolutional neural networks. They are therefore not scalable to large scale re-id deployment scenarios with the need of processing a large amount of surveillance video data, due to the lengthy inference process with high computing costs. In this work, we address this limitation via jointly learning re-id attention selection. Specifically, we formulate a novel harmonious attention network (HAN) framework to jointly learn soft pixel attention and hard region attention alongside simultaneous deep feature representation learning, particularly enabling more discriminative re-id matching by efficient networks with more scalable model inference and feature matching. Extensive evaluations validate the cost-effectiveness superiority of the proposed HAN approach for person re-id against a wide variety of state-of-the-art methods on four large benchmark datasets: CUHK03, Market-1501, DukeMTMC, and MSMT17.
Highlights
Person re-identification aims to search people across non-overlapping surveillance camera views deployed at different locations by matching auto-detected person bounding box images
The proposed approach is technically orthogonal to existing designs of efficient neural networks allowing for implementing complementary strengths by concurrent integration in a hybrid architecture. (III) We propose a harmonious attention network (HAN) framework to simultaneously learn hard region-level and soft pixel-level along with re-id feature representations for maximising the correlated complementary information between attention selection and feature discrimination in a compact architecture
We present a cost-effective Harmonious Attention Network (HAN) framework for joint learning of person re-identification attention selection and feature representations
Summary
Person re-identification (re-id) aims to search people across non-overlapping surveillance camera views deployed at different locations by matching auto-detected person bounding box images. With the 24/7 operating nature of surveillance cameras, person re-id is intrinsically a large scale search problem with a fundamental requirement for developing systems with both fast data throughput (i.e. low inference cost) and high matching accuracy. This is because, model accuracy and inference efficiency both are key enabling factors for affordable real-world person re-id applications. Earlier person re-id methods in the literature rely on slowto-compute high-dimensional hand crafted features with inferior model performance, yielding unsatisfactory solutions (Zheng et al 2013; Liao et al 2015; Matsukawa et al 2016; Zhang et al 2016; Wang et al 2018b). The recent introduction of large scale person re-id datasets (Wei et al 2018; Zheng et al 2015a; Li et al 2014; Ristani et al 2016) allows for a natural utilisation of increasingly powerful deep neural networks (He et al 2016; Szegedy et al 2017; Huang et al 2017), substantially improving person re-id accuracy in a single system pipeline
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.