Abstract

Person re-identification (PRe-id) aims to retrieve a target person's images captured across multiple/single non-overlapping cameras. To this end, significant techniques have been implemented that extract handcrafted, deep, part-based, and ensemble features to get more refined patterns for matching. But due to the limited focus on the multi-grained, view-consistent, and semantic correlation among different views, these approaches show low performance. Therefore, we present an attention-based multi-view correlation learning framework named (ACLS), which enables to learn multi-grained spatiotemporal features from individuals. The ACLS is mainly composed of three key steps: First, multi-view correlated visual features of pedestrians are extracted using a correlation vision transformer (CVIT) and a pyramid dilated network (PDN), followed by the person attention mechanism. Next, we employ convolutional long short-term memory (ConvLSTM) to extract spatiotemporal information from pedestrian images captured in different time frames. Finally, a deep fusion strategy is employed, which intelligently integrates features for the final matching task. Extensive evaluations are conducted over three famous datasets: Market-1501, DukeMCMT-reID, and CUHK03, results show tremendous ranking performance including 93.7%, 90.4%, and 85.7%. Thus, concluded remarks that our learning mechanism beats the current state-of-the-art (SOTA) methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call