Abstract

Modern anchor-based face detectors learn discriminative features using large-capacity networks and extensive anchor settings. In spite of their promising results, they are not without problems. First, most anchors extract redundant features from the background. As a consequence, the performance improvements are achieved at the expense of a disproportionate computational complexity. Second, the predicted face boxes are only distinguished by a classifier supervised by pre-defined positive, negative and ignored anchors. This strategy may ignore potential contributions from cohorts of anchors labeled negative/ignored during inference simply because of their inferior initialisation, although they can regress well to a target. In other words, true positives and representative features may get filtered out by unreliable confidence scores. To deal with the first concern and achieve more efficient face detection, we propose a Heatmap-assisted Spatial Attention (HSA) module and a Scale-aware Layer Attention (SLA) module to extract informative features using lower computational costs. To be specific, SLA incorporates the information from all the feature pyramid layers, weighted adaptively to remove redundant layers. HSA predicts a reshaped Gaussian heatmap and employs it to facilitate a spatial feature selection by better highlighting facial areas. For more reliable decision-making, we merge the predicted heatmap scores and classification results by voting. Since our heatmap scores are based on the distance to the face centres, they are able to retain all the well-regressed anchors. The experiments obtained on several well-known benchmarks demonstrate the merits of the proposed method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call