The diversity of pedestrians detectors proposed in recent years has encouraged some works to fuse them to achieve a more accurate detection. The intuition behind it is to combine the detectors based on its spatial consensus. The hypothesis is that a location pointed by multiple detectors has a high probability of actually belonging to a pedestrian, while false positive regions have little consensus among detectors (small support) which allows discarding the false positives in these regions. We proposed a novel method called Content-Based Spatial Consensus (CSBC), which, in addition to relying on spatial consensus, considers the content of the detection windows to learn a weighted-fusion of pedestrian detectors. The result is a reduction in false alarms and an enhancement in the detection. In this work, we also demonstrated that there is small influence of the feature used to learn the contents of the windows of each detector, which enables our method to be efficient even employing simple features. The CSBC overcomes state-of-the-art fusion methods in the ETH dataset and the Caltech dataset. Particularly, our method is also more efficient, since fewer detectors are necessary to achieve expressive results.