Non-overlapping Camera Views Research Articles

One of the key tasks for an intelligent visual surveillance system is to automatically re-identify objects of interest, e.g., persons or vehicles, from nonoverlapping camera views. This demand incurs the vast investigation of person re-identification (re-ID) and vehicle re-ID techniques, especially those deep learning-based ones. While most recent algorithms focus on designing new convolutional neural networks, less attention is paid to the loss functions, which are of vital roles as well. Triplet loss and softmax loss are the two losses that are extensively used, both of which, however, have limitations. Triplet loss optimizes the model to produce features with which samples from the same class have higher similarity than those from different classes. The problem of triplet loss is that the number of triplets to be constructed grows cubically with training samples, which causes scalability issue, unstable performance, and slow convergence. Softmax loss has favorable scalable property and is widely used for large-scale classification problems. However, since Softmax loss only aims to separate well training classes, its performance for re-ID tasks is not desirable because the model is tested to measure the similarity of samples from unseen classes. We propose the support neighbor (SN) loss, which avoids the limitations of the abovementioned two losses. Unlike triplet loss that is calculated based on triplets, SN loss is derived from K -nearest neighbors (SNs) of anchor samples. The SNs of an anchor are unique, containing more valuable contextual information and neighborhood structure of the anchor, and thus contribute to more stable performance and reliable embedding from image space to feature space. Based on the SNs, a softmax-like separation term and a squeeze term are proposed, which encourage interclass separation and intraclass compactness, respectively. Experiments show that SN loss surpasses triplet and softmax losses with the same backbone network and reaches the state-of-the-art performance for both person and vehicle re-ID using a ResNet50 backbone when combined with training tricks.

Read full abstract

In conventional person re-identification (re-id), the images used for model training in the training probe set and training gallery set are all assumed to be instance-level samples that are manually labeled from raw surveillance video (likely with the assistance of detection) in a frame-by-frame manner. This labeling across multiple non-overlapping camera views from raw video surveillance is expensive and time consuming. To overcome these issues, we consider a weakly supervised person re-id modeling that aims to find the raw video clips where a given target person appears. In our weakly supervised setting, during training, given a sample of a person captured in one camera view, our weakly supervised approach aims to train a re-id model without further instance-level labeling for this person in another camera view. The weak setting refers to matching a target person with an untrimmed gallery video where we only know that the identity appears in the video without the requirement of annotating the identity in any frame of the video during the training procedure. The weakly supervised person re-id is challenging since it not only suffers from the difficulties occurring in conventional person re-id (e.g., visual ambiguity and appearance variations caused by occlusions, pose variations, background clutter, etc.), but more importantly, is also challenged by weakly supervised information because the instance-level labels and the ground-truth locations for person instances (i.e., the ground-truth bounding boxes of person instances) are absent. To solve the weakly supervised person re-id problem, we develop deep graph metric learning (DGML). On the one hand, DGML measures the consistency between intra-video spatial graphs of consecutive frames, where the spatial graph captures neighborhood relationship about the detected person instances in each frame. On the other hand, DGML distinguishes the inter-video spatial graphs captured from different camera views at different sites simultaneously. To further explicitly embed weak supervision into the DGML and solve the weakly supervised person re-id problem, we introduce weakly supervised regularization (WSR), which utilizes multiple weak video-level labels to learn discriminative features by means of a weak identity loss and a cross-video alignment loss. We conduct extensive experiments to demonstrate the feasibility of the weakly supervised person re-id approach and its special cases (e.g., its bag-to-bag extension) and show that the proposed DGML is effective.

Read full abstract

Non-overlapping Camera Views Research Articles

Related Topics

Articles published on Non-overlapping Camera Views

A Versatile Framework for Multi-Scene Person Re-Identification.

CMOT: A cross-modality transformer for RGB-D fusion in person re-identification with online learning capabilities

Cross-modal Person Re-identification Based on Hybrid Learning Networks

Multi-Level Progressive Learning for Unsupervised Vehicle Re-Identification

Gait-Assisted Video Person Retrieval

Mini-transformer with pooling for unsupervised domain adaptation person reidentification

Learning global and local features using graph neural networks for person re-identification

Vehicle and Person Re-Identification With Support Neighbor Loss.

Dual-Stream Guided-Learning via a Priori Optimization for Person Re-identification

Deep progressive attention for person re-identification

Deep Graph Metric Learning for Weakly Supervised Person Re-Identification.

PMT-Net: Progressive Multi-Task Network for one-shot Person Re-Identification

Person Reidentification via Unsupervised Cross-View Metric Learning.

Person Re-Identification Based on Graph Relation Learning

Real-Time Vehicle Orientation Classification and Viewpoint-Aware Vehicle Re-Identification

Part-based Structured Representation Learning for Person Re-identification

View-specific subspace learning and re-ranking for semi-supervised person re-identification

An Improved Deep Mutual-Attention Learning Model for Person Re-Identification

Enforcing Affinity Feature Learning through Self-attention for Person Re-identification

RGB-IR Person Re-identification by Cross-Modality Similarity Preservation

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Non-overlapping Camera Views Research Articles

Related Topics

Articles published on Non-overlapping Camera Views

A Versatile Framework for Multi-Scene Person Re-Identification.

CMOT: A cross-modality transformer for RGB-D fusion in person re-identification with online learning capabilities

Cross-modal Person Re-identification Based on Hybrid Learning Networks

Multi-Level Progressive Learning for Unsupervised Vehicle Re-Identification

Gait-Assisted Video Person Retrieval

Mini-transformer with pooling for unsupervised domain adaptation person reidentification

Learning global and local features using graph neural networks for person re-identification

Vehicle and Person Re-Identification With Support Neighbor Loss.

Dual-Stream Guided-Learning via a Priori Optimization for Person Re-identification

Deep progressive attention for person re-identification

Deep Graph Metric Learning for Weakly Supervised Person Re-Identification.

PMT-Net: Progressive Multi-Task Network for one-shot Person Re-Identification

Person Reidentification via Unsupervised Cross-View Metric Learning.

Person Re-Identification Based on Graph Relation Learning

Real-Time Vehicle Orientation Classification and Viewpoint-Aware Vehicle Re-Identification

Part-based Structured Representation Learning for Person Re-identification

View-specific subspace learning and re-ranking for semi-supervised person re-identification

An Improved Deep Mutual-Attention Learning Model for Person Re-Identification

Enforcing Affinity Feature Learning through Self-attention for Person Re-identification

RGB-IR Person Re-identification by Cross-Modality Similarity Preservation