Crowdassign: A Label Assignment Scheme for Pedestrian Detection in Crowded Scenes

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Pedestrian detection in crowded scenes is a challenging problem. To avoid missing targets in a crowd, a high non-maximum suppression (NMS) threshold is necessary. However, a high NMS threshold may lead to a large number of false positives (FP), which results in serious degradation of pedestrian detection performance. In this paper, we propose a novel label assignment scheme for one-stage detectors, called CrowdAssign, which focuses on FP suppression. Firstly, we analyze the sources of FPs and classify them into different categories. Secondly, we examine the cause of different types of FP and develop a dynamic classification label adjustment strategy, which aims to reduce the confidence of the positive samples with high risk of being FPs. With our robust label assignment scheme, a single FCOS-ResNet-50 detector can reach $45.0 \ M R^{-2}$ on CrowdHuman under 1x schedule and outperforms state-of-the-art label assignment methods. Experiments on Citypersons demonstrate the desired generalization ability of our algorithm.

Similar Papers
  • Research Article
  • Cite Count Icon 117
  • 10.1609/aaai.v34i07.6690
PedHunter: Occlusion Robust Pedestrian Detector in Crowded Scenes
  • Apr 3, 2020
  • Proceedings of the AAAI Conference on Artificial Intelligence
  • Cheng Chi + 5 more

Pedestrian detection in crowded scenes is a challenging problem, because occlusion happens frequently among different pedestrians. In this paper, we propose an effective and efficient detection network to hunt pedestrians in crowd scenes. The proposed method, namely PedHunter, introduces strong occlusion handling ability to existing region-based detection networks without bringing extra computations in the inference stage. Specifically, we design a mask-guided module to leverage the head information to enhance the feature representation learning of the backbone network. Moreover, we develop a strict classification criterion by improving the quality of positive samples during training to eliminate common false positives of pedestrian detection in crowded scenes. Besides, we present an occlusion-simulated data augmentation to enrich the pattern and quantity of occlusion samples to improve the occlusion robustness. As a consequent, we achieve state-of-the-art results on three pedestrian detection datasets including CityPersons, Caltech-USA and CrowdHuman. To facilitate further studies on the occluded pedestrian detection in surveillance scenes, we release a new pedestrian dataset, called SUR-PED, with a total of over 162k high-quality manually labeled instances in 10k images. The proposed dataset, source codes and trained models are available at https://github.com/ChiCheng123/PedHunter.

  • Research Article
  • Cite Count Icon 6
  • 10.3390/app13148073
Multi-Attribute NMS: An Enhanced Non-Maximum Suppression Algorithm for Pedestrian Detection in Crowded Scenes
  • Jul 11, 2023
  • Applied Sciences
  • Wei Wang + 5 more

Removing duplicate proposals is a critical process in pedestrian detection, and is usually performed via Non-Maximum Suppression (NMS); however, in crowded scenes, the detection proposals of occluded pedestrians are hard to distinguish from duplicate proposals, making the detection results inaccurate. In order to address the above-mentioned problem, the authors of this paper propose a Multi-Attribute NMS (MA-NMS) algorithm, which combines density and count attributes in order to adaptively adjust suppression, effectively preserving the proposals of occluded pedestrians while removing duplicate proposals. In order to obtain the density and count attributes, an attribute branch (ATTB), which uses a context extraction module (CEM) to extract the context of pedestrians, and then, concatenates the context with the features of pedestrians in order to predict both the density and count attributes simultaneously, is also proposed. With the proposed ATTB, a pedestrian detector, based on MA-NMS, is constructed for pedestrian detection in crowded scenes. Extensive experiments are conducted using the CrowdHuman and CityPersons datasets, and the results show that the proposed method outperforms mainstream methods on AP (average precision), Recall, and MR−2 (log-average miss rate), sufficiently validating the effectiveness of the proposed MA-NMS algorithm.

  • Conference Article
  • Cite Count Icon 50
  • 10.1109/cvpr.2012.6248045
Multi-pedestrian detection in crowded scenes: A global view
  • Jun 1, 2012
  • Junjie Yan + 3 more

Recent state-of-the-art algorithms have achieved good performance on normal pedestrian detection tasks. However, pedestrian detection in crowded scenes is still challenging due to the significant appearance variation caused by heavy occlusions and complex spatial interactions. In this paper we propose a unified probabilistic framework to globally describe multiple pedestrians in crowded scenes in terms of appearance and spatial interaction. We utilize a mixture model, where every pedestrian is assumed in a special subclass and described by the sub-model. Scores of pedestrian parts are used to represent appearance and quadratic kernel is used to represent relative spatial interaction. For efficient inference, multi-pedestrian detection is modeled as a MAP problem and we utilize greedy algorithm to get an approximation. For discriminative parameter learning, we formulate it as a learning to rank problem, and propose Latent Rank SVM for learning from weakly labeled data. Experiments on various databases validate the effectiveness of the proposed approach.

  • Research Article
  • Cite Count Icon 12
  • 10.1016/j.neucom.2014.11.104
Joint components based pedestrian detection in crowded scenes using extended feature descriptors
  • Dec 15, 2015
  • Neurocomputing
  • Van-Dung Hoang + 1 more

Joint components based pedestrian detection in crowded scenes using extended feature descriptors

  • Conference Article
  • Cite Count Icon 901
  • 10.1109/cvpr.2005.272
Pedestrian Detection in Crowded Scenes
  • Jun 20, 2005
  • B Leibe + 2 more

In this paper, we address the problem of detecting pedestrians in crowded real-world scenes with severe overlaps. Our basic premise is that this problem is too difficult for any type of model or feature alone. Instead, we present an algorithm that integrates evidence in multiple iterations and from different sources. The core part of our method is the combination of local and global cues via probabilistic top-down segmentation. Altogether, this approach allows examining and comparing object hypotheses with high precision down to the pixel level. Qualitative and quantitative results on a large data set confirm that our method is able to reliably detect pedestrians in crowded scenes, even when they overlap and partially occlude each other. In addition, the flexible nature of our approach allows it to operate on very small training sets.

  • Research Article
  • Cite Count Icon 3
  • 10.1007/s40010-015-0231-3
Using Weighted Part Model for Pedestrian Detection in Crowded Scenes Based on Image Segmentation
  • Dec 12, 2015
  • Proceedings of the National Academy of Sciences, India Section A: Physical Sciences
  • Jia Wen + 3 more

In this paper, we present a pedestrian detection approach based on weighted part model in crowded scenes, instead of the deformable part model. To avoid the low detection rate in crowded scenes, a pedestrian cascaded detection algorithm based on selective search segmentation is used. Compared with the state-of-the-art methods on the database of PASCAL VOC2007 dataset and PETS2009 dataset, the proposed method achieves a better accuracy (42.3 and about 12 percent improvement) without decreasing the detection speed.

  • Research Article
  • Cite Count Icon 7
  • 10.3934/mbe.2023633
AD-DETR: DETR with asymmetrical relation and decoupled attention in crowded scenes.
  • Jan 1, 2023
  • Mathematical Biosciences and Engineering
  • Yueming Huang + 1 more

Pedestrian detection in crowded scenes is widely used in computer vision. However, it still has two difficulties: 1) eliminating repeated predictions (multiple predictions corresponding to the same object); 2) false detection and missing detection due to the high scene occlusion rate and the small visible area of detected pedestrians. This paper presents a detection framework based on DETR (detection transformer) to address the above problems, and the model is called AD-DETR (asymmetrical relation detection transformer). We find that the symmetry in a DETR framework causes synchronous prediction updates and duplicate predictions. Therefore, we propose an asymmetric relationship fusion mechanism and let each query asymmetrically fuse the relative relationships of surrounding predictions to learn to eliminate duplicate predictions. Then, we propose a decoupled cross-attention head that allows the model to learn to restrict the range of attention to focus more on visible regions and regions that contribute more to confidence. The method can reduce the noise information introduced by the occluded objects to reduce the false detection rate. Meanwhile, in our proposed asymmetric relations module, we establish a way to encode the relative relation between sets of attention points and improve the baseline. Without additional annotations, combined with the deformable-DETR with Res50 as the backbone, our method can achieve an average precision of 92.6%, MR$ ^{-2} $ of 40.0% and Jaccard index of 84.4% on the challenging CrowdHuman dataset. Our method exceeds previous methods, such as Iter-E2EDet (progressive end-to-end object detection), MIP (one proposal, multiple predictions), etc. Experiments show that our method can significantly improve the performance of the query-based model for crowded scenes, and it is highly robust for the crowded scene.

  • Conference Article
  • Cite Count Icon 200
  • 10.1109/cvpr42600.2020.01076
NMS by Representative Region: Towards Crowded Pedestrian Detection by Proposal Pairing
  • Jun 1, 2020
  • Xin Huang + 3 more

Although significant progress has been made in pedestrian detection recently, pedestrian detection in crowded scenes is still challenging. The heavy occlusion between pedestrians imposes great challenges to the standard Non-Maximum Suppression (NMS). A relative low threshold of intersection over union (IoU) leads to missing highly overlapped pedestrians, while a higher one brings in plenty of false positives. To avoid such a dilemma, this paper proposes a novel Representative Region NMS (R2NMS) approach leveraging the less occluded visible parts, effectively removing the redundant boxes without bringing in many false positives. To acquire the visible parts, a novel Paired-Box Model (PBM) is proposed to simultaneously predict the full and visible boxes of a pedestrian. The full and visible boxes constitute a pair serving as the sample unit of the model, thus guaranteeing a strong correspondence between the two boxes throughout the detection pipeline. Moreover, convenient feature integration of the two boxes is allowed for the better performance on both full and visible pedestrian detection tasks. Experiments on the challenging CrowdHuman and CityPersons benchmarks sufficiently validate the effectiveness of the proposed approach on pedestrian detection in the crowded situation.

  • Research Article
  • Cite Count Icon 28
  • 10.1111/mice.12163
Kinect‐Based Pedestrian Detection for Crowded Scenes
  • Jul 24, 2015
  • Computer-Aided Civil and Infrastructure Engineering
  • Xiaofeng Chen + 2 more

Pedestrian movement data including volumes, walking speeds, and trajectories are essential in transportation engineering, planning, and research. Although traditional image‐based pedestrian detectors provide very rich information, their performance degrades quickly with increased occurrence of occlusion. The three‐dimensional sensing capabilities of Microsoft's Kinect present a potential cost‐effective solution for occlusion‐robust pedestrian detection. This article proposes an efficient pedestrian detection approach for crowded scenes by fusing RGB and depth images from the Kinect. More specifically, we first extract the pedestrian contour regions from RGB images using background subtraction. Then, we develop a region clustering algorithm to extract pedestrians from the contour regions using depth information. Finally, a tracking and counting algorithm is designed to acquire pedestrian volumes. The proposed approach was proven effective with an average detection accuracy of 93.1% at 20 frames per second. These results demonstrate the feasibility of using the low‐cost Kinect device for real‐world pedestrian detection in crowded scenes.

  • Research Article
  • Cite Count Icon 21
  • 10.1016/j.patcog.2022.108605
High quality proposal feature generation for crowded pedestrian detection
  • Feb 28, 2022
  • Pattern Recognition
  • Jing Wang + 4 more

High quality proposal feature generation for crowded pedestrian detection

  • Conference Article
  • Cite Count Icon 17
  • 10.1109/icip.2016.7532550
Pedestrian detection in crowded scenes via scale and occlusion analysis
  • Sep 1, 2016
  • Lu Wang + 2 more

Despite significant progress in pedestrian detection has been made in recent years, detecting pedestrians in crowded scenes remains a challenging problem. In this paper, we propose to use visual contexts based on scale and occlusion cues from detections at proximity to better detect pedestrians for surveillance applications. Specifically, we first apply detectors based on full body and parts to generate initial detections. Scale prior at each image location is estimated using the cues provided by neighboring detections, and the confidence score of each detection is refined according to its consistency with the estimated scale prior. Local occlusion analysis is exploited in refining detection confidence scores which facilitates the final detection cluster based Non-Maximum Suppression. Experimental results on benchmark data sets show that the proposed algorithm performs favorably against the state-of-the-art methods.

  • Research Article
  • Cite Count Icon 10
  • 10.1109/access.2019.2928879
Learning Pixel-Level and Instance-Level Context-Aware Features for Pedestrian Detection in Crowds
  • Jan 1, 2019
  • IEEE Access
  • Chi Fei + 3 more

Pedestrian detection in crowded scenes is an intractable problem in computer vision, in which occlusion often presents a great challenge. In this paper, we propose a novel context-aware feature learning method for detecting pedestrians in crowds, with the purpose of making better use of context information for dealing with occlusion. Unlike most current pedestrian detectors that only extract context information from a single and fixed region, a new pixel-level context embedding module is developed to integrate multi-cue context into a deep CNN feature hierarchy, which enables access to the context of various regions by multi-branch convolution layers with different receptive fields. In addition, to utilize the distinctive visual characteristics formed by pedestrians that appear in groups and occlude each other, we propose a novel instance-level context prediction module which is actually implemented by a two-person detector, to improve the one-person detection performance. Applying with these strategies, we achieve an efficient and lightweight detector that can be trained in an end-to-end fashion. We evaluate the proposed approach on two popular pedestrian detection datasets, i.e., Caltech and CityPersons. The extensive experimental results demonstrate the effectiveness of the proposed method, especially under heavy occlusion cases.

  • Conference Article
  • Cite Count Icon 13
  • 10.5220/0004739105990604
English
  • Jan 1, 2014
  • Lu Wang + 3 more

Pedestrian detection is a challenging task for video surveillance. The problem becomes more difficult when occlusion is prevalent. In this paper, we extend a deformable part-based pedestrian detector to pedestrian detection in crowded scenes by considering both body part detection responses and detections' mutual spatial relationship. Specifically, we first decompose the full body detector into several body part detectors, whose detection responses can be computed efficiently from the response of the full body detector. Then, given the detection responses of the body part detectors, hypotheses are nominated by considering both detection scores and responses' mutual spatial relationship. Finally, a local optimization process is applied to make the final decision, where an objective function encouraging detections with high confidence, high discriminability and low conflict with other detections is proposed to select the best candidate detections. Experimental results show the effectiveness of the proposed approach.

  • Book Chapter
  • Cite Count Icon 501
  • 10.1007/978-3-030-01219-9_39
Occlusion-Aware R-CNN: Detecting Pedestrians in a Crowd
  • Jan 1, 2018
  • Shifeng Zhang + 4 more

Pedestrian detection in crowded scenes is a challenging problem since the pedestrians often gather together and occlude each other. In this paper, we propose a new occlusion-aware R-CNN (OR-CNN) to improve the detection accuracy in the crowd. Specifically, we design a new aggregation loss to enforce proposals to be close and locate compactly to the corresponding objects. Meanwhile, we use a new part occlusion-aware region of interest (PORoI) pooling unit to replace the RoI pooling layer in order to integrate the prior structure information of human body with visibility prediction into the network to handle occlusion. Our detector is trained in an end-to-end fashion, which achieves state-of-the-art results on three pedestrian detection datasets, i.e., CityPersons, ETH, and INRIA, and performs on-pair with the state-of-the-arts on Caltech.

  • Book Chapter
  • Cite Count Icon 33
  • 10.1007/978-3-319-48881-3_48
Unsupervised Deep Domain Adaptation for Pedestrian Detection
  • Jan 1, 2016
  • Lihang Liu + 4 more

This paper addresses the problem of unsupervised domain adaptation on the task of pedestrian detection in crowded scenes. First, we utilize an iterative algorithm to iteratively select and auto-annotate positive pedestrian samples with high confidence as the training samples for the target domain. Meanwhile, we also reuse negative samples from the source domain to compensate for the imbalance between the amount of positive samples and negative samples. Second, based on the deep network we also design an unsupervised regularizer to mitigate influence from data noise. More specifically, we transform the last fully connected layer into two sub-layers — an element-wise multiply layer and a sum layer, and add the unsupervised regularizer to further improve the domain adaptation accuracy. In experiments for pedestrian detection, the proposed method boosts the recall value by nearly \(30\,\%\) while the precision stays almost the same. Furthermore, we perform our method on standard domain adaptation benchmarks on both supervised and unsupervised settings and also achieve state-of-the-art results.

Save Icon
Up Arrow
Open/Close