It remains a challenging task to detect pedestrians in crowds and it needs more efforts to understand why the detectors fail. When we perform an error analysis based on the traditional evaluation strategy, we find that it produces many misleading false positives, which in fact cover occluded pedestrians. The reason for this is that we usually have two kinds of annotations in the dataset: regular pedestrians (detection targets) labeled by full-body boxes and ignored pedestrians (NOT detection targets) labeled by visible boxes. Ignored pedestrians are labeled as an additional category termed the “ignore region”. Nevertheless, our detectors always predict a full-body box for each pedestrian. This gap results in the following case: when a detector successfully predicts a full-body box for those ignored pedestrians, a false positive is triggered due to the low overlap between the predicted full-body box and the labeled visible box for the ignored pedestrian. This becomes even more harmful as the detector improves and becomes more capable of locating occluded pedestrians. To alleviate this issue, we devise a new pedestrian detection pipeline, which considers the additional visible box at both the detection and evaluation stages. During detection, we predict an extra visible box apart from the full-body box for every instance; during evaluation, we employ visible boxes instead of full-body boxes to match the “ignore region”. We apply the new pipeline to dozens of detection methods and validate the effectiveness of our pipeline in reducing the over-reporting of false positives and providing more reliable evaluation results.
Read full abstract