Abstract
Video-based person re-identification (Re-ID) remains to be a promising but challenging computer vision task, suffering from a lack of discriminative features that better aggregate both the spatial and temporal information. In this paper, we propose a joint attentive spatial-temporal feature aggregation network (JAFN) for the video-based person Re-ID, simultaneously learning the quality- and frame-aware model to obtain attention-based spatial-temporal feature aggregation. Specifically, we utilize CNN to learn the spatial features, while introducing the LSTM to separately learn the temporal features. For the feature aggregation, we introduce two attention mechanisms respectively for generating the quality and frame significance score, where the quality score measures the quality of the images for attentive spatial feature aggregation, and the frame score measures the significance of the image frames contributing to the temporal feature. Then, we utilize the set-pooling for both the quality-aware spatial feature and the frame-aware temporal feature aggregation based on the attentive scores. The residual learning is also introduced to play between the LSTM and the CNN for adaptive spatial-temporal feature fusion. Furthermore, we adopt the data balance to alleviate the data disproportions existing in datasets of the video-based Re-ID. The extensive experimental results conducted on the PRID2011, i-LIDS-VID, and MARS datasets demonstrate the effectiveness of the proposed JAFN. Furthermore, comparison results conducted on different modules and features in the JAFN show that our approach is of favorable generalization ability on attentively aggregating both the spatial and temporal features.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.