Abstract

Semi-supervised learning aims to boost the model performance by large amounts of unlabelled data, thereby reducing the overheads of labelling. For joint pedestrian and face detection in real-world scenarios, the existing semi-supervised object detection methods rarely focus on the category relevance between samples, resulting in unsatisfied classification performance. And it is not effective for existing semi-supervised methods to integrate the categories from two datasets to obtain an ensemble network. In this work, a novel approach aiming to fully utilise category-relevant information is proposed. Firstly, the multi-teacher distillation for decoupling pedestrian and face categories are introduced to eliminate category unfairness in distillation process. Second, a coupled attention module embedded in classification head of the student network is proposed to better grasp the relevance of different categories from teachers and guide distillation. Moreover, the constraint loss is designed for stabilising the training process and better converging, so as to tailor a versatile student. The experimental results on the CrowdHuman and WiderFace benchmarks demonstrate the superiority of the approach over the state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call